Overview

Dataset statistics

Number of variables41
Number of observations697117
Missing cells10023004
Missing cells (%)35.1%
Duplicate rows18841
Duplicate rows (%)2.7%
Total size in memory1.2 GiB
Average record size in memory1.8 KiB

Variable types

Numeric1
Categorical29
Unsupported5
Boolean6

Alerts

Status has constant value "Active"Constant
Dataset has 18841 (2.7%) duplicate rowsDuplicates
Guid has a high cardinality: 18841 distinct valuesHigh cardinality
FullName has a high cardinality: 18451 distinct valuesHigh cardinality
FirstName has a high cardinality: 11655 distinct valuesHigh cardinality
Surname has a high cardinality: 10229 distinct valuesHigh cardinality
IdNumber has a high cardinality: 14913 distinct valuesHigh cardinality
AllergyType has a high cardinality: 99 distinct valuesHigh cardinality
EmergencyContactNumber has a high cardinality: 2672 distinct valuesHigh cardinality
EmergencyContactFullName has a high cardinality: 2906 distinct valuesHigh cardinality
AlternativePickupContactNumber has a high cardinality: 626 distinct valuesHigh cardinality
BirthDate has a high cardinality: 2019 distinct valuesHigh cardinality
StartDate has a high cardinality: 811 distinct valuesHigh cardinality
Franchisee.Guid has a high cardinality: 3639 distinct valuesHigh cardinality
Caregiver.FullName has a high cardinality: 17723 distinct valuesHigh cardinality
Caregiver.FirstName has a high cardinality: 9230 distinct valuesHigh cardinality
Caregiver.Surname has a high cardinality: 9578 distinct valuesHigh cardinality
Caregiver.IdNumber has a high cardinality: 15284 distinct valuesHigh cardinality
Caregiver.ContactNumber has a high cardinality: 11312 distinct valuesHigh cardinality
Caregiver.Guid has a high cardinality: 18103 distinct valuesHigh cardinality
AllergyType is highly imbalanced (52.8%)Imbalance
HasAllergy is highly imbalanced (96.4%)Imbalance
HasDisability is highly imbalanced (97.2%)Imbalance
EthnicGroup is highly imbalanced (82.5%)Imbalance
GrantType is highly imbalanced (66.0%)Imbalance
InactiveReason is highly imbalanced (83.4%)Imbalance
Caregiver.RelationshipType is highly imbalanced (66.5%)Imbalance
Caregiver.HighestEducationLevel is highly imbalanced (88.4%)Imbalance
IdNumber has 34632 (5.0%) missing valuesMissing
AllergyType has 657971 (94.4%) missing valuesMissing
DisabilityType has 697006 (> 99.9%) missing valuesMissing
HealthConditions has 697117 (100.0%) missing valuesMissing
EmergencyContactNumber has 579161 (83.1%) missing valuesMissing
EmergencyContactFullName has 578643 (83.0%) missing valuesMissing
EmergencyContactFirstName has 697117 (100.0%) missing valuesMissing
EmergencyContactSurname has 697117 (100.0%) missing valuesMissing
AlternativePickupFirstName has 697117 (100.0%) missing valuesMissing
AlternativePickupSurname has 697117 (100.0%) missing valuesMissing
AlternativePickupContactNumber has 658970 (94.5%) missing valuesMissing
BirthDate has 62345 (8.9%) missing valuesMissing
HasDisability has 346468 (49.7%) missing valuesMissing
Gender has 31968 (4.6%) missing valuesMissing
EthnicGroup has 211714 (30.4%) missing valuesMissing
HomeLanguage has 226033 (32.4%) missing valuesMissing
GrantType has 11655 (1.7%) missing valuesMissing
InactiveReason has 647833 (92.9%) missing valuesMissing
Caregiver.IdNumber has 90058 (12.9%) missing valuesMissing
Caregiver.ContactNumber has 227587 (32.6%) missing valuesMissing
Caregiver.RelationshipType has 278832 (40.0%) missing valuesMissing
Caregiver.HighestEducationLevel has 519591 (74.5%) missing valuesMissing
Caregiver.Language has 676212 (97.0%) missing valuesMissing
Unnamed: 0 is uniformly distributedUniform
Guid is uniformly distributedUniform
DisabilityType is uniformly distributedUniform
HealthConditions is an unsupported type, check if it needs cleaning or further analysisUnsupported
EmergencyContactFirstName is an unsupported type, check if it needs cleaning or further analysisUnsupported
EmergencyContactSurname is an unsupported type, check if it needs cleaning or further analysisUnsupported
AlternativePickupFirstName is an unsupported type, check if it needs cleaning or further analysisUnsupported
AlternativePickupSurname is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-06-13 10:55:30.528237
Analysis finished2023-06-13 10:56:19.285962
Duration48.76 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

Distinct18841
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9420
Minimum0
Maximum18840
Zeros37
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size5.3 MiB
2023-06-13T12:56:19.359229image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile942
Q14710
median9420
Q314130
95-th percentile17898
Maximum18840
Range18840
Interquartile range (IQR)9420

Descriptive statistics

Standard deviation5438.9321
Coefficient of variation (CV)0.57738133
Kurtosis-1.2
Mean9420
Median Absolute Deviation (MAD)4710
Skewness0
Sum6.5668421 × 109
Variance29581982
MonotonicityNot monotonic
2023-06-13T12:56:19.451822image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 37
 
< 0.1%
12549 37
 
< 0.1%
12565 37
 
< 0.1%
12564 37
 
< 0.1%
12563 37
 
< 0.1%
12562 37
 
< 0.1%
12561 37
 
< 0.1%
12560 37
 
< 0.1%
12559 37
 
< 0.1%
12558 37
 
< 0.1%
Other values (18831) 696747
99.9%
ValueCountFrequency (%)
0 37
< 0.1%
1 37
< 0.1%
2 37
< 0.1%
3 37
< 0.1%
4 37
< 0.1%
5 37
< 0.1%
6 37
< 0.1%
7 37
< 0.1%
8 37
< 0.1%
9 37
< 0.1%
ValueCountFrequency (%)
18840 37
< 0.1%
18839 37
< 0.1%
18838 37
< 0.1%
18837 37
< 0.1%
18836 37
< 0.1%
18835 37
< 0.1%
18834 37
< 0.1%
18833 37
< 0.1%
18832 37
< 0.1%
18831 37
< 0.1%

Guid
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct18841
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size61.8 MiB
0605e301-a345-ea11-833a-00155d326100
 
37
f3461cb9-3d49-ec11-834d-00155d326100
 
37
19bf40c4-ce49-ec11-834d-00155d326100
 
37
4dc71855-ce49-ec11-834d-00155d326100
 
37
eade4824-cc49-ec11-834d-00155d326100
 
37
Other values (18836)
696932 

Length

Max length36
Median length36
Mean length36
Min length36

Characters and Unicode

Total characters25096212
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0605e301-a345-ea11-833a-00155d326100
2nd row5c1e6f1a-bc45-ea11-833a-00155d326100
3rd row5637445f-eb45-ea11-833a-00155d326100
4th row4da208b6-fa45-ea11-833a-00155d326100
5th rowcdb4a38c-4f46-ea11-833a-00155d326100

Common Values

ValueCountFrequency (%)
0605e301-a345-ea11-833a-00155d326100 37
 
< 0.1%
f3461cb9-3d49-ec11-834d-00155d326100 37
 
< 0.1%
19bf40c4-ce49-ec11-834d-00155d326100 37
 
< 0.1%
4dc71855-ce49-ec11-834d-00155d326100 37
 
< 0.1%
eade4824-cc49-ec11-834d-00155d326100 37
 
< 0.1%
f4176c59-cb49-ec11-834d-00155d326100 37
 
< 0.1%
2236f3d8-ca49-ec11-834d-00155d326100 37
 
< 0.1%
3b8a7756-ca49-ec11-834d-00155d326100 37
 
< 0.1%
6d0928d9-c949-ec11-834d-00155d326100 37
 
< 0.1%
804f9214-c749-ec11-834d-00155d326100 37
 
< 0.1%
Other values (18831) 696747
99.9%

Length

2023-06-13T12:56:19.546143image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0605e301-a345-ea11-833a-00155d326100 37
 
< 0.1%
e53d020c-6b46-ea11-833a-00155d326100 37
 
< 0.1%
cdb4a38c-4f46-ea11-833a-00155d326100 37
 
< 0.1%
2b427474-5046-ea11-833a-00155d326100 37
 
< 0.1%
52abcbd9-5046-ea11-833a-00155d326100 37
 
< 0.1%
6e17db16-5146-ea11-833a-00155d326100 37
 
< 0.1%
6aab0708-5246-ea11-833a-00155d326100 37
 
< 0.1%
63745080-5246-ea11-833a-00155d326100 37
 
< 0.1%
941ae956-5446-ea11-833a-00155d326100 37
 
< 0.1%
53a902d7-5446-ea11-833a-00155d326100 37
 
< 0.1%
Other values (18831) 696747
99.9%

Most occurring characters

ValueCountFrequency (%)
0 3391161
13.5%
1 3348463
13.3%
- 2788468
11.1%
5 2148405
8.6%
3 1962702
 
7.8%
8 1320789
 
5.3%
6 1313611
 
5.2%
d 1288784
 
5.1%
e 1198763
 
4.8%
2 1198319
 
4.8%
Other values (7) 5136747
20.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16944742
67.5%
Lowercase Letter 5363002
 
21.4%
Dash Punctuation 2788468
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3391161
20.0%
1 3348463
19.8%
5 2148405
12.7%
3 1962702
11.6%
8 1320789
 
7.8%
6 1313611
 
7.8%
2 1198319
 
7.1%
4 968808
 
5.7%
9 714248
 
4.2%
7 578236
 
3.4%
Lowercase Letter
ValueCountFrequency (%)
d 1288784
24.0%
e 1198763
22.4%
b 878898
16.4%
c 856624
16.0%
a 646612
12.1%
f 493321
 
9.2%
Dash Punctuation
ValueCountFrequency (%)
- 2788468
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19733210
78.6%
Latin 5363002
 
21.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3391161
17.2%
1 3348463
17.0%
- 2788468
14.1%
5 2148405
10.9%
3 1962702
9.9%
8 1320789
 
6.7%
6 1313611
 
6.7%
2 1198319
 
6.1%
4 968808
 
4.9%
9 714248
 
3.6%
Latin
ValueCountFrequency (%)
d 1288784
24.0%
e 1198763
22.4%
b 878898
16.4%
c 856624
16.0%
a 646612
12.1%
f 493321
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25096212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3391161
13.5%
1 3348463
13.3%
- 2788468
11.1%
5 2148405
8.6%
3 1962702
 
7.8%
8 1320789
 
5.3%
6 1313611
 
5.2%
d 1288784
 
5.1%
e 1198763
 
4.8%
2 1198319
 
4.8%
Other values (7) 5136747
20.5%

FullName
Categorical

Distinct18451
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size49.9 MiB
Thabo Macala
 
333
Siyabonga Khumalo
 
148
Enzokuhle Ngcobo
 
148
Asande Ndlovu
 
148
Hlompo Gaosekwe
 
148
Other values (18446)
696192 

Length

Max length61
Median length46
Mean length18.012101
Min length3

Characters and Unicode

Total characters12556542
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMxolisi komani
2nd rowThateho Ramohlabi
3rd rowShenaaze van wyk
4th rowLeatitia Zona
5th rowAvandro Pieter Klaaste

Common Values

ValueCountFrequency (%)
Thabo Macala 333
 
< 0.1%
Siyabonga Khumalo 148
 
< 0.1%
Enzokuhle Ngcobo 148
 
< 0.1%
Asande Ndlovu 148
 
< 0.1%
Hlompo Gaosekwe 148
 
< 0.1%
Sukoluhle Ngubane 148
 
< 0.1%
Lethokuhle Lethokuhle 111
 
< 0.1%
Andani Netshivhale 111
 
< 0.1%
Asenathi Cele 111
 
< 0.1%
Mpho Spandiel 111
 
< 0.1%
Other values (18441) 695600
99.8%

Length

2023-06-13T12:56:19.639094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dlamini 6697
 
0.4%
junior 6549
 
0.4%
enzokuhle 6512
 
0.4%
blessing 6031
 
0.4%
melokuhle 5661
 
0.3%
lethabo 5328
 
0.3%
ndlovu 5106
 
0.3%
ngubane 4921
 
0.3%
lubanzi 4588
 
0.3%
sithole 4255
 
0.3%
Other values (16591) 1581047
96.6%

Most occurring characters

ValueCountFrequency (%)
a 1357419
 
10.8%
e 1171494
 
9.3%
944980
 
7.5%
o 897805
 
7.2%
i 802789
 
6.4%
l 796129
 
6.3%
n 780626
 
6.2%
h 606948
 
4.8%
u 436045
 
3.5%
s 403189
 
3.2%
Other values (74) 4359118
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9926138
79.1%
Uppercase Letter 1673880
 
13.3%
Space Separator 944980
 
7.5%
Dash Punctuation 4329
 
< 0.1%
Decimal Number 3885
 
< 0.1%
Other Punctuation 2516
 
< 0.1%
Control 444
 
< 0.1%
Modifier Symbol 185
 
< 0.1%
Connector Punctuation 74
 
< 0.1%
Open Punctuation 37
 
< 0.1%
Other values (2) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1357419
13.7%
e 1171494
11.8%
o 897805
 
9.0%
i 802789
 
8.1%
l 796129
 
8.0%
n 780626
 
7.9%
h 606948
 
6.1%
u 436045
 
4.4%
s 403189
 
4.1%
t 402190
 
4.1%
Other values (23) 2271504
22.9%
Uppercase Letter
ValueCountFrequency (%)
M 318570
19.0%
S 157953
 
9.4%
N 151071
 
9.0%
L 136530
 
8.2%
A 125615
 
7.5%
K 104377
 
6.2%
T 90872
 
5.4%
B 75776
 
4.5%
O 67118
 
4.0%
P 48137
 
2.9%
Other values (17) 397861
23.8%
Decimal Number
ValueCountFrequency (%)
1 999
25.7%
0 962
24.8%
8 407
10.5%
7 370
 
9.5%
2 370
 
9.5%
9 259
 
6.7%
3 222
 
5.7%
5 148
 
3.8%
4 111
 
2.9%
6 37
 
1.0%
Other Punctuation
ValueCountFrequency (%)
. 962
38.2%
' 777
30.9%
, 444
17.6%
/ 296
 
11.8%
? 37
 
1.5%
Modifier Symbol
ValueCountFrequency (%)
` 148
80.0%
🏾 37
 
20.0%
Space Separator
ValueCountFrequency (%)
944980
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4329
100.0%
Control
ValueCountFrequency (%)
444
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 74
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%
Other Symbol
ValueCountFrequency (%)
👋 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11600018
92.4%
Common 956524
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1357419
 
11.7%
e 1171494
 
10.1%
o 897805
 
7.7%
i 802789
 
6.9%
l 796129
 
6.9%
n 780626
 
6.7%
h 606948
 
5.2%
u 436045
 
3.8%
s 403189
 
3.5%
t 402190
 
3.5%
Other values (50) 3945384
34.0%
Common
ValueCountFrequency (%)
944980
98.8%
- 4329
 
0.5%
1 999
 
0.1%
. 962
 
0.1%
0 962
 
0.1%
' 777
 
0.1%
444
 
< 0.1%
, 444
 
< 0.1%
8 407
 
< 0.1%
7 370
 
< 0.1%
Other values (14) 1850
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12555913
> 99.9%
None 629
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1357419
 
10.8%
e 1171494
 
9.3%
944980
 
7.5%
o 897805
 
7.2%
i 802789
 
6.4%
l 796129
 
6.3%
n 780626
 
6.2%
h 606948
 
4.8%
u 436045
 
3.5%
s 403189
 
3.2%
Other values (64) 4358489
34.7%
None
ValueCountFrequency (%)
é 148
23.5%
è 111
17.6%
à 74
11.8%
ë 74
11.8%
ç 37
 
5.9%
Ñ 37
 
5.9%
ķ 37
 
5.9%
ñ 37
 
5.9%
👋 37
 
5.9%
🏾 37
 
5.9%

FirstName
Categorical

Distinct11655
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size44.8 MiB
Melokuhle
 
2923
Enzokuhle
 
2886
Lethabo
 
2405
Lesedi
 
1850
Mpho
 
1850
Other values (11650)
685203 

Length

Max length46
Median length33
Mean length10.366966
Min length1

Characters and Unicode

Total characters7226988
Distinct characters80
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMxolisi
2nd rowThateho
3rd rowShenaaze
4th rowLeatitia
5th rowAvandro Pieter

Common Values

ValueCountFrequency (%)
Melokuhle 2923
 
0.4%
Enzokuhle 2886
 
0.4%
Lethabo 2405
 
0.3%
Lesedi 1850
 
0.3%
Mpho 1850
 
0.3%
Karabo 1850
 
0.3%
Omphile 1702
 
0.2%
Rethabile 1628
 
0.2%
Bokamoso 1591
 
0.2%
Alunamda 1554
 
0.2%
Other values (11645) 676878
97.1%

Length

2023-06-13T12:56:19.747253image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
enzokuhle 6253
 
0.7%
junior 5883
 
0.6%
melokuhle 5402
 
0.6%
blessing 5328
 
0.6%
lethabo 5143
 
0.6%
lubanzi 4292
 
0.5%
karabo 3848
 
0.4%
lesedi 3478
 
0.4%
lethokuhle 3404
 
0.4%
omphile 3293
 
0.4%
Other values (8553) 881451
95.0%

Most occurring characters

ValueCountFrequency (%)
e 741036
 
10.3%
a 680689
 
9.4%
o 549894
 
7.6%
l 516483
 
7.1%
i 495356
 
6.9%
n 480223
 
6.6%
h 401820
 
5.6%
347208
 
4.8%
u 258038
 
3.6%
t 246827
 
3.4%
Other values (70) 2509414
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5926216
82.0%
Uppercase Letter 944647
 
13.1%
Space Separator 347208
 
4.8%
Dash Punctuation 3811
 
0.1%
Decimal Number 2923
 
< 0.1%
Other Punctuation 1665
 
< 0.1%
Control 259
 
< 0.1%
Connector Punctuation 74
 
< 0.1%
Modifier Symbol 74
 
< 0.1%
Other Symbol 37
 
< 0.1%
Other values (2) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 741036
12.5%
a 680689
11.5%
o 549894
9.3%
l 516483
 
8.7%
i 495356
 
8.4%
n 480223
 
8.1%
h 401820
 
6.8%
u 258038
 
4.4%
t 246827
 
4.2%
s 243867
 
4.1%
Other values (21) 1311983
22.1%
Uppercase Letter
ValueCountFrequency (%)
L 110260
11.7%
A 107559
11.4%
S 90576
9.6%
M 79439
 
8.4%
K 67710
 
7.2%
N 66156
 
7.0%
T 59385
 
6.3%
O 58386
 
6.2%
B 48137
 
5.1%
E 33744
 
3.6%
Other values (17) 223295
23.6%
Decimal Number
ValueCountFrequency (%)
1 851
29.1%
0 666
22.8%
7 333
 
11.4%
8 259
 
8.9%
9 222
 
7.6%
2 185
 
6.3%
3 148
 
5.1%
4 111
 
3.8%
5 111
 
3.8%
6 37
 
1.3%
Other Punctuation
ValueCountFrequency (%)
. 740
44.4%
' 592
35.6%
, 333
20.0%
Modifier Symbol
ValueCountFrequency (%)
` 37
50.0%
🏾 37
50.0%
Space Separator
ValueCountFrequency (%)
347208
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3811
100.0%
Control
ValueCountFrequency (%)
259
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 74
100.0%
Other Symbol
ValueCountFrequency (%)
👋 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6870863
95.1%
Common 356125
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 741036
 
10.8%
a 680689
 
9.9%
o 549894
 
8.0%
l 516483
 
7.5%
i 495356
 
7.2%
n 480223
 
7.0%
h 401820
 
5.8%
u 258038
 
3.8%
t 246827
 
3.6%
s 243867
 
3.5%
Other values (48) 2256630
32.8%
Common
ValueCountFrequency (%)
347208
97.5%
- 3811
 
1.1%
1 851
 
0.2%
. 740
 
0.2%
0 666
 
0.2%
' 592
 
0.2%
, 333
 
0.1%
7 333
 
0.1%
259
 
0.1%
8 259
 
0.1%
Other values (12) 1073
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7226507
> 99.9%
None 481
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 741036
 
10.3%
a 680689
 
9.4%
o 549894
 
7.6%
l 516483
 
7.1%
i 495356
 
6.9%
n 480223
 
6.6%
h 401820
 
5.6%
347208
 
4.8%
u 258038
 
3.6%
t 246827
 
3.4%
Other values (62) 2508933
34.7%
None
ValueCountFrequency (%)
é 148
30.8%
è 111
23.1%
👋 37
 
7.7%
ķ 37
 
7.7%
à 37
 
7.7%
ë 37
 
7.7%
Ñ 37
 
7.7%
🏾 37
 
7.7%

Surname
Categorical

Distinct10229
Distinct (%)1.5%
Missing370
Missing (%)0.1%
Memory size42.5 MiB
Dlamini
 
5587
Ndlovu
 
3663
Mahlangu
 
2923
Sithole
 
2812
Ngubane
 
2479
Other values (10224)
679283 

Length

Max length30
Median length28
Mean length6.8755244
Min length1

Characters and Unicode

Total characters4790501
Distinct characters72
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkomani
2nd rowRamohlabi
3rd rowvan wyk
4th rowZona
5th rowKlaaste

Common Values

ValueCountFrequency (%)
Dlamini 5587
 
0.8%
Ndlovu 3663
 
0.5%
Mahlangu 2923
 
0.4%
Sithole 2812
 
0.4%
Ngubane 2479
 
0.4%
Khumalo 2183
 
0.3%
Ngubane 2072
 
0.3%
Mokoena 2072
 
0.3%
Mbatha 2035
 
0.3%
Mkhize 1961
 
0.3%
Other values (10219) 668960
96.0%

Length

2023-06-13T12:56:19.847354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dlamini 6105
 
0.9%
ngubane 4699
 
0.7%
ndlovu 4699
 
0.7%
sithole 4033
 
0.6%
mkhize 3441
 
0.5%
mahlangu 3367
 
0.5%
khumalo 3071
 
0.4%
mbatha 2923
 
0.4%
dladla 2368
 
0.3%
mokoena 2331
 
0.3%
Other values (9604) 671513
94.8%

Most occurring characters

ValueCountFrequency (%)
a 676730
14.1%
e 430458
 
9.0%
o 347911
 
7.3%
i 307433
 
6.4%
n 300403
 
6.3%
l 279646
 
5.8%
M 239131
 
5.0%
h 205128
 
4.3%
u 178007
 
3.7%
s 159322
 
3.3%
Other values (62) 1666332
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3999922
83.5%
Uppercase Letter 728493
 
15.2%
Space Separator 59755
 
1.2%
Decimal Number 962
 
< 0.1%
Other Punctuation 555
 
< 0.1%
Dash Punctuation 518
 
< 0.1%
Control 185
 
< 0.1%
Modifier Symbol 111
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 676730
16.9%
e 430458
10.8%
o 347911
 
8.7%
i 307433
 
7.7%
n 300403
 
7.5%
l 279646
 
7.0%
h 205128
 
5.1%
u 178007
 
4.5%
s 159322
 
4.0%
t 155363
 
3.9%
Other values (20) 959521
24.0%
Uppercase Letter
ValueCountFrequency (%)
M 239131
32.8%
N 84545
 
11.6%
S 67377
 
9.2%
K 36667
 
5.0%
T 31487
 
4.3%
D 30784
 
4.2%
B 27639
 
3.8%
L 26270
 
3.6%
P 20239
 
2.8%
G 18611
 
2.6%
Other values (16) 145743
20.0%
Decimal Number
ValueCountFrequency (%)
0 296
30.8%
2 185
19.2%
8 148
15.4%
1 148
15.4%
3 74
 
7.7%
9 37
 
3.8%
5 37
 
3.8%
7 37
 
3.8%
Other Punctuation
ValueCountFrequency (%)
. 222
40.0%
' 185
33.3%
, 111
20.0%
? 37
 
6.7%
Space Separator
ValueCountFrequency (%)
59755
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 518
100.0%
Control
ValueCountFrequency (%)
185
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 111
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4728415
98.7%
Common 62086
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 676730
14.3%
e 430458
 
9.1%
o 347911
 
7.4%
i 307433
 
6.5%
n 300403
 
6.4%
l 279646
 
5.9%
M 239131
 
5.1%
h 205128
 
4.3%
u 178007
 
3.8%
s 159322
 
3.4%
Other values (46) 1604246
33.9%
Common
ValueCountFrequency (%)
59755
96.2%
- 518
 
0.8%
0 296
 
0.5%
. 222
 
0.4%
2 185
 
0.3%
' 185
 
0.3%
185
 
0.3%
8 148
 
0.2%
1 148
 
0.2%
, 111
 
0.2%
Other values (6) 333
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4790353
> 99.9%
None 148
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 676730
14.1%
e 430458
 
9.0%
o 347911
 
7.3%
i 307433
 
6.4%
n 300403
 
6.3%
l 279646
 
5.8%
M 239131
 
5.0%
h 205128
 
4.3%
u 178007
 
3.7%
s 159322
 
3.3%
Other values (58) 1666184
34.8%
None
ValueCountFrequency (%)
ç 37
25.0%
ë 37
25.0%
à 37
25.0%
ñ 37
25.0%

IdNumber
Categorical

HIGH CARDINALITY  MISSING 

Distinct14913
Distinct (%)2.3%
Missing34632
Missing (%)5.0%
Memory size45.2 MiB
0000000000012
88356 
0000000000000
 
2812
000000000012
 
1813
000
 
1591
0000
 
1480
Other values (14908)
566433 

Length

Max length20
Median length13
Mean length12.816867
Min length1

Characters and Unicode

Total characters8490982
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0000000000012
2nd row1807095666084
3rd row0000000000012
4th row0000000000012
5th row1806226123086

Common Values

ValueCountFrequency (%)
0000000000012 88356
 
12.7%
0000000000000 2812
 
0.4%
000000000012 1813
 
0.3%
000 1591
 
0.2%
0000 1480
 
0.2%
000000000000 925
 
0.1%
0000000000 888
 
0.1%
00000000000 444
 
0.1%
0000000000123 296
 
< 0.1%
0000000000001 185
 
< 0.1%
Other values (14903) 563695
80.9%
(Missing) 34632
 
5.0%

Length

2023-06-13T12:56:19.942563image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0000000000012 88356
 
13.3%
0000000000000 2812
 
0.4%
000000000012 1813
 
0.3%
000 1591
 
0.2%
0000 1480
 
0.2%
000000000000 925
 
0.1%
0000000000 888
 
0.1%
00000000000 444
 
0.1%
0000000000123 296
 
< 0.1%
none 259
 
< 0.1%
Other values (14907) 563806
85.1%

Most occurring characters

ValueCountFrequency (%)
0 2755390
32.5%
1 1419098
16.7%
8 1059421
 
12.5%
2 699781
 
8.2%
5 496688
 
5.8%
7 456099
 
5.4%
6 450734
 
5.3%
9 433011
 
5.1%
3 368964
 
4.3%
4 332926
 
3.9%
Other values (24) 18870
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8472112
99.8%
Other Punctuation 13801
 
0.2%
Dash Punctuation 2553
 
< 0.1%
Lowercase Letter 1147
 
< 0.1%
Uppercase Letter 1073
 
< 0.1%
Space Separator 259
 
< 0.1%
Modifier Symbol 37
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 370
34.5%
B 148
 
13.8%
A 148
 
13.8%
M 111
 
10.3%
F 74
 
6.9%
S 37
 
3.4%
R 37
 
3.4%
C 37
 
3.4%
D 37
 
3.4%
J 37
 
3.4%
Decimal Number
ValueCountFrequency (%)
0 2755390
32.5%
1 1419098
16.8%
8 1059421
 
12.5%
2 699781
 
8.3%
5 496688
 
5.9%
7 456099
 
5.4%
6 450734
 
5.3%
9 433011
 
5.1%
3 368964
 
4.4%
4 332926
 
3.9%
Lowercase Letter
ValueCountFrequency (%)
n 333
29.0%
o 333
29.0%
e 296
25.8%
h 37
 
3.2%
b 37
 
3.2%
z 37
 
3.2%
w 37
 
3.2%
v 37
 
3.2%
Other Punctuation
ValueCountFrequency (%)
/ 13764
99.7%
. 37
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
- 2553
100.0%
Space Separator
ValueCountFrequency (%)
259
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8488762
> 99.9%
Latin 2220
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 370
16.7%
n 333
15.0%
o 333
15.0%
e 296
13.3%
B 148
 
6.7%
A 148
 
6.7%
M 111
 
5.0%
F 74
 
3.3%
S 37
 
1.7%
R 37
 
1.7%
Other values (9) 333
15.0%
Common
ValueCountFrequency (%)
0 2755390
32.5%
1 1419098
16.7%
8 1059421
 
12.5%
2 699781
 
8.2%
5 496688
 
5.9%
7 456099
 
5.4%
6 450734
 
5.3%
9 433011
 
5.1%
3 368964
 
4.3%
4 332926
 
3.9%
Other values (5) 16650
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8490982
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2755390
32.5%
1 1419098
16.7%
8 1059421
 
12.5%
2 699781
 
8.2%
5 496688
 
5.8%
7 456099
 
5.4%
6 450734
 
5.3%
9 433011
 
5.1%
3 368964
 
4.3%
4 332926
 
3.9%
Other values (24) 18870
 
0.2%

AllergyType
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct99
Distinct (%)0.3%
Missing657971
Missing (%)94.4%
Memory size22.4 MiB
None
11803 
none
9176 
no
7252 
No
5069 
None
 
814
Other values (94)
5032 

Length

Max length62
Median length4
Mean length4.210775
Min length2

Characters and Unicode

Total characters164835
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowShe must not be exposed to the sun as her nose starts bleeding
2nd rowNone
3rd rowNone
4th rowNone
5th rowNo

Common Values

ValueCountFrequency (%)
None 11803
 
1.7%
none 9176
 
1.3%
no 7252
 
1.0%
No 5069
 
0.7%
None 814
 
0.1%
None listed 555
 
0.1%
NONE 259
 
< 0.1%
None Listed 222
 
< 0.1%
Eczema 148
 
< 0.1%
NO 111
 
< 0.1%
Other values (89) 3737
 
0.5%
(Missing) 657971
94.4%

Length

2023-06-13T12:56:20.042332image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none 23051
53.1%
no 12580
29.0%
listed 962
 
2.2%
to 296
 
0.7%
eczema 222
 
0.5%
tin 185
 
0.4%
and 185
 
0.4%
sinus 185
 
0.4%
allergies 185
 
0.4%
fish 185
 
0.4%
Other values (109) 5402
 
12.4%

Most occurring characters

ValueCountFrequency (%)
n 41773
25.3%
o 36926
22.4%
e 27898
16.9%
N 19536
11.9%
5587
 
3.4%
s 4107
 
2.5%
i 3589
 
2.2%
t 3145
 
1.9%
a 2997
 
1.8%
l 2442
 
1.5%
Other values (47) 16835
10.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 134458
81.6%
Uppercase Letter 23865
 
14.5%
Space Separator 5587
 
3.4%
Other Punctuation 370
 
0.2%
Decimal Number 333
 
0.2%
Control 148
 
0.1%
Open Punctuation 37
 
< 0.1%
Close Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 41773
31.1%
o 36926
27.5%
e 27898
20.7%
s 4107
 
3.1%
i 3589
 
2.7%
t 3145
 
2.3%
a 2997
 
2.2%
l 2442
 
1.8%
d 1776
 
1.3%
r 1665
 
1.2%
Other values (15) 8140
 
6.1%
Uppercase Letter
ValueCountFrequency (%)
N 19536
81.9%
S 629
 
2.6%
E 592
 
2.5%
A 444
 
1.9%
O 407
 
1.7%
B 333
 
1.4%
L 296
 
1.2%
P 259
 
1.1%
C 222
 
0.9%
D 185
 
0.8%
Other values (10) 962
 
4.0%
Decimal Number
ValueCountFrequency (%)
2 148
44.4%
0 74
22.2%
6 37
 
11.1%
5 37
 
11.1%
9 37
 
11.1%
Other Punctuation
ValueCountFrequency (%)
, 185
50.0%
/ 148
40.0%
& 37
 
10.0%
Space Separator
ValueCountFrequency (%)
5587
100.0%
Control
ValueCountFrequency (%)
148
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 158323
96.0%
Common 6512
 
4.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 41773
26.4%
o 36926
23.3%
e 27898
17.6%
N 19536
12.3%
s 4107
 
2.6%
i 3589
 
2.3%
t 3145
 
2.0%
a 2997
 
1.9%
l 2442
 
1.5%
d 1776
 
1.1%
Other values (35) 14134
 
8.9%
Common
ValueCountFrequency (%)
5587
85.8%
, 185
 
2.8%
148
 
2.3%
2 148
 
2.3%
/ 148
 
2.3%
0 74
 
1.1%
( 37
 
0.6%
) 37
 
0.6%
6 37
 
0.6%
5 37
 
0.6%
Other values (2) 74
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 164835
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 41773
25.3%
o 36926
22.4%
e 27898
16.9%
N 19536
11.9%
5587
 
3.4%
s 4107
 
2.5%
i 3589
 
2.2%
t 3145
 
1.9%
a 2997
 
1.8%
l 2442
 
1.5%
Other values (47) 16835
10.2%

DisabilityType
Categorical

MISSING  UNIFORM 

Distinct3
Distinct (%)2.7%
Missing697006
Missing (%)> 99.9%
Memory size21.3 MiB
no
37 
Chronic Illness
37 
Ashtma
37 

Length

Max length15
Median length6
Mean length7.6666667
Min length2

Characters and Unicode

Total characters851
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowno
2nd rowChronic Illness
3rd rowAshtma
4th rowno
5th rowChronic Illness

Common Values

ValueCountFrequency (%)
no 37
 
< 0.1%
Chronic Illness 37
 
< 0.1%
Ashtma 37
 
< 0.1%
(Missing) 697006
> 99.9%

Length

2023-06-13T12:56:20.129760image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:20.215906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
no 37
25.0%
chronic 37
25.0%
illness 37
25.0%
ashtma 37
25.0%

Most occurring characters

ValueCountFrequency (%)
n 111
13.0%
s 111
13.0%
o 74
 
8.7%
h 74
 
8.7%
l 74
 
8.7%
C 37
 
4.3%
r 37
 
4.3%
i 37
 
4.3%
c 37
 
4.3%
37
 
4.3%
Other values (6) 222
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 703
82.6%
Uppercase Letter 111
 
13.0%
Space Separator 37
 
4.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 111
15.8%
s 111
15.8%
o 74
10.5%
h 74
10.5%
l 74
10.5%
r 37
 
5.3%
i 37
 
5.3%
c 37
 
5.3%
e 37
 
5.3%
t 37
 
5.3%
Other values (2) 74
10.5%
Uppercase Letter
ValueCountFrequency (%)
C 37
33.3%
I 37
33.3%
A 37
33.3%
Space Separator
ValueCountFrequency (%)
37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 814
95.7%
Common 37
 
4.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 111
13.6%
s 111
13.6%
o 74
 
9.1%
h 74
 
9.1%
l 74
 
9.1%
C 37
 
4.5%
r 37
 
4.5%
i 37
 
4.5%
c 37
 
4.5%
I 37
 
4.5%
Other values (5) 185
22.7%
Common
ValueCountFrequency (%)
37
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 851
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 111
13.0%
s 111
13.0%
o 74
 
8.7%
h 74
 
8.7%
l 74
 
8.7%
C 37
 
4.3%
r 37
 
4.3%
i 37
 
4.3%
c 37
 
4.3%
37
 
4.3%
Other values (6) 222
26.1%

HealthConditions
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing697117
Missing (%)100.0%
Memory size5.3 MiB

EmergencyContactNumber
Categorical

HIGH CARDINALITY  MISSING 

Distinct2672
Distinct (%)2.3%
Missing579161
Missing (%)83.1%
Memory size25.2 MiB
0
 
5439
0681145763
 
2368
0648747951
 
259
0726014177
 
259
0760251247
 
222
Other values (2667)
109409 

Length

Max length20
Median length10
Mean length9.5890841
Min length1

Characters and Unicode

Total characters1131090
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0635118027
2nd row0714248050
3rd row0625698598
4th row0769598598
5th row0738862330

Common Values

ValueCountFrequency (%)
0 5439
 
0.8%
0681145763 2368
 
0.3%
0648747951 259
 
< 0.1%
0726014177 259
 
< 0.1%
0760251247 222
 
< 0.1%
0799619663 222
 
< 0.1%
0818364480 185
 
< 0.1%
0787015537 185
 
< 0.1%
0790268989 185
 
< 0.1%
0718389466 185
 
< 0.1%
Other values (2662) 108447
 
15.6%
(Missing) 579161
83.1%

Length

2023-06-13T12:56:20.294298image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 5439
 
4.6%
0681145763 2368
 
2.0%
0648747951 259
 
0.2%
0726014177 259
 
0.2%
0760251247 222
 
0.2%
0799619663 222
 
0.2%
0818364480 185
 
0.2%
0787015537 185
 
0.2%
0790268989 185
 
0.2%
0718389466 185
 
0.2%
Other values (2687) 109520
92.0%

Most occurring characters

ValueCountFrequency (%)
0 199023
17.6%
7 147149
13.0%
6 122507
10.8%
8 102601
9.1%
2 99641
8.8%
1 98457
8.7%
3 95127
8.4%
4 86987
7.7%
9 85100
7.5%
5 81178
7.2%
Other values (43) 13320
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1117770
98.8%
Lowercase Letter 10027
 
0.9%
Uppercase Letter 1924
 
0.2%
Space Separator 1295
 
0.1%
Other Punctuation 37
 
< 0.1%
Close Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1517
15.1%
a 1406
14.0%
o 999
10.0%
l 925
9.2%
n 666
 
6.6%
t 666
 
6.6%
i 629
 
6.3%
h 518
 
5.2%
s 407
 
4.1%
k 296
 
3.0%
Other values (12) 1998
19.9%
Uppercase Letter
ValueCountFrequency (%)
S 518
26.9%
M 296
15.4%
L 185
 
9.6%
A 148
 
7.7%
T 111
 
5.8%
G 74
 
3.8%
N 74
 
3.8%
C 74
 
3.8%
O 74
 
3.8%
P 74
 
3.8%
Other values (8) 296
15.4%
Decimal Number
ValueCountFrequency (%)
0 199023
17.8%
7 147149
13.2%
6 122507
11.0%
8 102601
9.2%
2 99641
8.9%
1 98457
8.8%
3 95127
8.5%
4 86987
7.8%
9 85100
7.6%
5 81178
7.3%
Space Separator
ValueCountFrequency (%)
1295
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1119139
98.9%
Latin 11951
 
1.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1517
12.7%
a 1406
 
11.8%
o 999
 
8.4%
l 925
 
7.7%
n 666
 
5.6%
t 666
 
5.6%
i 629
 
5.3%
S 518
 
4.3%
h 518
 
4.3%
s 407
 
3.4%
Other values (30) 3700
31.0%
Common
ValueCountFrequency (%)
0 199023
17.8%
7 147149
13.1%
6 122507
10.9%
8 102601
9.2%
2 99641
8.9%
1 98457
8.8%
3 95127
8.5%
4 86987
7.8%
9 85100
7.6%
5 81178
7.3%
Other values (3) 1369
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1131090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 199023
17.6%
7 147149
13.0%
6 122507
10.8%
8 102601
9.1%
2 99641
8.8%
1 98457
8.7%
3 95127
8.4%
4 86987
7.7%
9 85100
7.5%
5 81178
7.2%
Other values (43) 13320
 
1.2%

EmergencyContactFullName
Categorical

HIGH CARDINALITY  MISSING 

Distinct2906
Distinct (%)2.5%
Missing578643
Missing (%)83.0%
Memory size25.6 MiB
Thandiwe
 
296
Kgomotso
 
259
Lizzy
 
259
Sandile Mvelase
 
222
Bongekile Ximba
 
222
Other values (2901)
117216 

Length

Max length38
Median length27
Mean length13.54466
Min length2

Characters and Unicode

Total characters1604690
Distinct characters71
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHans Koopman
2nd rowValencia Van Wyk
3rd rowEugene Louw
4th rowLeandre Koopman
5th rowFilicia Dawid

Common Values

ValueCountFrequency (%)
Thandiwe 296
 
< 0.1%
Kgomotso 259
 
< 0.1%
Lizzy 259
 
< 0.1%
Sandile Mvelase 222
 
< 0.1%
Bongekile Ximba 222
 
< 0.1%
Kelebogile 222
 
< 0.1%
Lerato 185
 
< 0.1%
Veronica 185
 
< 0.1%
Zandile 185
 
< 0.1%
Nompumelelo 185
 
< 0.1%
Other values (2896) 116254
 
16.7%
(Missing) 578643
83.0%

Length

2023-06-13T12:56:20.392069image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ngubane 2664
 
1.3%
dlamini 1369
 
0.6%
mkhize 1258
 
0.6%
ndlovu 1184
 
0.6%
sithole 1110
 
0.5%
dladla 1036
 
0.5%
mbatha 1036
 
0.5%
khumalo 925
 
0.4%
zandile 888
 
0.4%
zuma 851
 
0.4%
Other values (3090) 199615
94.2%

Most occurring characters

ValueCountFrequency (%)
a 164058
 
10.2%
e 161283
 
10.1%
138380
 
8.6%
i 123950
 
7.7%
o 106856
 
6.7%
l 97532
 
6.1%
n 97162
 
6.1%
h 65971
 
4.1%
s 52651
 
3.3%
u 47064
 
2.9%
Other values (61) 549783
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1250748
77.9%
Uppercase Letter 212787
 
13.3%
Space Separator 138380
 
8.6%
Decimal Number 1628
 
0.1%
Dash Punctuation 629
 
< 0.1%
Other Punctuation 444
 
< 0.1%
Open Punctuation 37
 
< 0.1%
Close Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 164058
13.1%
e 161283
12.9%
i 123950
9.9%
o 106856
 
8.5%
l 97532
 
7.8%
n 97162
 
7.8%
h 65971
 
5.3%
s 52651
 
4.2%
u 47064
 
3.8%
t 45843
 
3.7%
Other values (16) 288378
23.1%
Uppercase Letter
ValueCountFrequency (%)
M 43586
20.5%
N 27639
13.0%
S 21608
10.2%
T 13209
 
6.2%
K 12876
 
6.1%
L 10323
 
4.9%
B 10175
 
4.8%
D 9731
 
4.6%
Z 7400
 
3.5%
P 7326
 
3.4%
Other values (16) 48914
23.0%
Decimal Number
ValueCountFrequency (%)
0 370
22.7%
3 296
18.2%
1 222
13.6%
5 185
11.4%
2 148
 
9.1%
9 111
 
6.8%
4 74
 
4.5%
7 74
 
4.5%
6 74
 
4.5%
8 74
 
4.5%
Other Punctuation
ValueCountFrequency (%)
. 259
58.3%
; 74
 
16.7%
' 37
 
8.3%
, 37
 
8.3%
/ 37
 
8.3%
Space Separator
ValueCountFrequency (%)
138380
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 629
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1463535
91.2%
Common 141155
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 164058
 
11.2%
e 161283
 
11.0%
i 123950
 
8.5%
o 106856
 
7.3%
l 97532
 
6.7%
n 97162
 
6.6%
h 65971
 
4.5%
s 52651
 
3.6%
u 47064
 
3.2%
t 45843
 
3.1%
Other values (42) 501165
34.2%
Common
ValueCountFrequency (%)
138380
98.0%
- 629
 
0.4%
0 370
 
0.3%
3 296
 
0.2%
. 259
 
0.2%
1 222
 
0.2%
5 185
 
0.1%
2 148
 
0.1%
9 111
 
0.1%
; 74
 
0.1%
Other values (9) 481
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1604690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 164058
 
10.2%
e 161283
 
10.1%
138380
 
8.6%
i 123950
 
7.7%
o 106856
 
6.7%
l 97532
 
6.1%
n 97162
 
6.1%
h 65971
 
4.1%
s 52651
 
3.3%
u 47064
 
2.9%
Other values (61) 549783
34.3%

EmergencyContactFirstName
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing697117
Missing (%)100.0%
Memory size5.3 MiB

EmergencyContactSurname
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing697117
Missing (%)100.0%
Memory size5.3 MiB

AlternativePickupFirstName
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing697117
Missing (%)100.0%
Memory size5.3 MiB

AlternativePickupSurname
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing697117
Missing (%)100.0%
Memory size5.3 MiB

AlternativePickupContactNumber
Categorical

HIGH CARDINALITY  MISSING 

Distinct626
Distinct (%)1.6%
Missing658970
Missing (%)94.5%
Memory size22.4 MiB
0
12136 
0818325688
 
222
0790268989
 
148
None
 
148
0825856457
 
148
Other values (621)
25345 

Length

Max length10
Median length10
Mean length7.0795344
Min length1

Characters and Unicode

Total characters270063
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0764810096
3rd row0710863033
4th row0714778174
5th row0710863033

Common Values

ValueCountFrequency (%)
0 12136
 
1.7%
0818325688 222
 
< 0.1%
0790268989 148
 
< 0.1%
None 148
 
< 0.1%
0825856457 148
 
< 0.1%
0799619663 148
 
< 0.1%
0797419017 148
 
< 0.1%
0818876048 148
 
< 0.1%
0715804800 111
 
< 0.1%
0720478111 111
 
< 0.1%
Other values (616) 24679
 
3.5%
(Missing) 658970
94.5%

Length

2023-06-13T12:56:20.477817image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 12173
31.9%
0818325688 222
 
0.6%
0790268989 148
 
0.4%
none 148
 
0.4%
0825856457 148
 
0.4%
0799619663 148
 
0.4%
0797419017 148
 
0.4%
0818876048 148
 
0.4%
0793898860 111
 
0.3%
0761751465 111
 
0.3%
Other values (615) 24642
64.6%

Most occurring characters

ValueCountFrequency (%)
0 55611
20.6%
7 37111
13.7%
6 26529
9.8%
2 25641
9.5%
8 22792
8.4%
1 22237
 
8.2%
9 21201
 
7.9%
4 20313
 
7.5%
3 19795
 
7.3%
5 17390
 
6.4%
Other values (16) 1443
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 268620
99.5%
Lowercase Letter 1110
 
0.4%
Uppercase Letter 259
 
0.1%
Space Separator 37
 
< 0.1%
Other Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 185
16.7%
e 185
16.7%
o 148
13.3%
a 111
10.0%
i 74
 
6.7%
l 74
 
6.7%
h 74
 
6.7%
s 74
 
6.7%
t 74
 
6.7%
g 37
 
3.3%
Other values (2) 74
 
6.7%
Decimal Number
ValueCountFrequency (%)
0 55611
20.7%
7 37111
13.8%
6 26529
9.9%
2 25641
9.5%
8 22792
8.5%
1 22237
 
8.3%
9 21201
 
7.9%
4 20313
 
7.6%
3 19795
 
7.4%
5 17390
 
6.5%
Uppercase Letter
ValueCountFrequency (%)
N 185
71.4%
M 74
 
28.6%
Space Separator
ValueCountFrequency (%)
37
100.0%
Other Punctuation
ValueCountFrequency (%)
. 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 268694
99.5%
Latin 1369
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 185
13.5%
n 185
13.5%
e 185
13.5%
o 148
10.8%
a 111
8.1%
i 74
 
5.4%
l 74
 
5.4%
h 74
 
5.4%
s 74
 
5.4%
t 74
 
5.4%
Other values (4) 185
13.5%
Common
ValueCountFrequency (%)
0 55611
20.7%
7 37111
13.8%
6 26529
9.9%
2 25641
9.5%
8 22792
8.5%
1 22237
 
8.3%
9 21201
 
7.9%
4 20313
 
7.6%
3 19795
 
7.4%
5 17390
 
6.5%
Other values (2) 74
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 270063
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 55611
20.6%
7 37111
13.7%
6 26529
9.8%
2 25641
9.5%
8 22792
8.4%
1 22237
 
8.2%
9 21201
 
7.9%
4 20313
 
7.5%
3 19795
 
7.3%
5 17390
 
6.4%
Other values (16) 1443
 
0.5%

BirthDate
Categorical

HIGH CARDINALITY  MISSING 

Distinct2019
Distinct (%)0.3%
Missing62345
Missing (%)8.9%
Memory size48.5 MiB
2018-09-20T22:00:00Z
 
1369
2018-09-27T22:00:00Z
 
1221
2018-02-06T22:00:00Z
 
1184
2018-08-12T22:00:00Z
 
1184
2018-04-18T22:00:00Z
 
1147
Other values (2014)
628667 

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters12695440
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017-02-16T22:00:00Z
2nd row2018-07-08T22:00:00Z
3rd row2016-04-10T22:00:00Z
4th row2015-06-10T22:00:00Z
5th row2018-10-07T22:00:00Z

Common Values

ValueCountFrequency (%)
2018-09-20T22:00:00Z 1369
 
0.2%
2018-09-27T22:00:00Z 1221
 
0.2%
2018-02-06T22:00:00Z 1184
 
0.2%
2018-08-12T22:00:00Z 1184
 
0.2%
2018-04-18T22:00:00Z 1147
 
0.2%
2018-04-04T22:00:00Z 1110
 
0.2%
2018-08-15T22:00:00Z 1110
 
0.2%
2018-09-04T22:00:00Z 1110
 
0.2%
2018-10-09T22:00:00Z 1110
 
0.2%
2018-09-19T22:00:00Z 1073
 
0.2%
Other values (2009) 623154
89.4%
(Missing) 62345
 
8.9%

Length

2023-06-13T12:56:20.548262image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2018-09-20t22:00:00z 1369
 
0.2%
2018-09-27t22:00:00z 1221
 
0.2%
2018-02-06t22:00:00z 1184
 
0.2%
2018-08-12t22:00:00z 1184
 
0.2%
2018-04-18t22:00:00z 1147
 
0.2%
2018-04-04t22:00:00z 1110
 
0.2%
2018-08-15t22:00:00z 1110
 
0.2%
2018-09-04t22:00:00z 1110
 
0.2%
2018-10-09t22:00:00z 1110
 
0.2%
2018-09-19t22:00:00z 1073
 
0.2%
Other values (2009) 623154
98.2%

Most occurring characters

ValueCountFrequency (%)
0 3988156
31.4%
2 2304730
18.2%
- 1269544
 
10.0%
: 1269544
 
10.0%
1 1143411
 
9.0%
T 634772
 
5.0%
Z 634772
 
5.0%
8 371369
 
2.9%
7 290339
 
2.3%
9 232175
 
1.8%
Other values (4) 556628
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8886808
70.0%
Dash Punctuation 1269544
 
10.0%
Other Punctuation 1269544
 
10.0%
Uppercase Letter 1269544
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3988156
44.9%
2 2304730
25.9%
1 1143411
 
12.9%
8 371369
 
4.2%
7 290339
 
3.3%
9 232175
 
2.6%
6 173382
 
2.0%
3 147001
 
1.7%
5 121841
 
1.4%
4 114404
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
T 634772
50.0%
Z 634772
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1269544
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1269544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11425896
90.0%
Latin 1269544
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3988156
34.9%
2 2304730
20.2%
- 1269544
 
11.1%
: 1269544
 
11.1%
1 1143411
 
10.0%
8 371369
 
3.3%
7 290339
 
2.5%
9 232175
 
2.0%
6 173382
 
1.5%
3 147001
 
1.3%
Other values (2) 236245
 
2.1%
Latin
ValueCountFrequency (%)
T 634772
50.0%
Z 634772
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12695440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3988156
31.4%
2 2304730
18.2%
- 1269544
 
10.0%
: 1269544
 
10.0%
1 1143411
 
9.0%
T 634772
 
5.0%
Z 634772
 
5.0%
8 371369
 
2.9%
7 290339
 
2.3%
9 232175
 
1.8%
Other values (4) 556628
 
4.4%

StartDate
Categorical

Distinct811
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size50.5 MiB
2022-01-19T00:00:00
 
17316
2021-02-15T00:00:00
 
12358
2020-01-15T00:00:00
 
12025
2021-04-06T00:00:00
 
10730
2022-01-10T00:00:00
 
8806
Other values (806)
635882 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters13245223
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-01-17T00:00:00
2nd row2020-01-01T00:00:00
3rd row2020-01-17T00:00:00
4th row2019-10-03T00:00:00
5th row2019-10-22T00:00:00

Common Values

ValueCountFrequency (%)
2022-01-19T00:00:00 17316
 
2.5%
2021-02-15T00:00:00 12358
 
1.8%
2020-01-15T00:00:00 12025
 
1.7%
2021-04-06T00:00:00 10730
 
1.5%
2022-01-10T00:00:00 8806
 
1.3%
2021-04-07T00:00:00 8806
 
1.3%
2021-03-01T00:00:00 8547
 
1.2%
2021-01-11T00:00:00 6771
 
1.0%
2022-01-24T00:00:00 6734
 
1.0%
2022-02-01T00:00:00 6623
 
1.0%
Other values (801) 598401
85.8%

Length

2023-06-13T12:56:20.620921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-01-19t00:00:00 17316
 
2.5%
2021-02-15t00:00:00 12358
 
1.8%
2020-01-15t00:00:00 12025
 
1.7%
2021-04-06t00:00:00 10730
 
1.5%
2022-01-10t00:00:00 8806
 
1.3%
2021-04-07t00:00:00 8806
 
1.3%
2021-03-01t00:00:00 8547
 
1.2%
2021-01-11t00:00:00 6771
 
1.0%
2022-01-24t00:00:00 6734
 
1.0%
2022-02-01t00:00:00 6623
 
1.0%
Other values (801) 598401
85.8%

Most occurring characters

ValueCountFrequency (%)
0 5937797
44.8%
2 1960001
 
14.8%
- 1394234
 
10.5%
: 1394234
 
10.5%
1 1053427
 
8.0%
T 697117
 
5.3%
3 192511
 
1.5%
5 136271
 
1.0%
4 135309
 
1.0%
6 101047
 
0.8%
Other values (3) 243275
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9759638
73.7%
Dash Punctuation 1394234
 
10.5%
Other Punctuation 1394234
 
10.5%
Uppercase Letter 697117
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5937797
60.8%
2 1960001
 
20.1%
1 1053427
 
10.8%
3 192511
 
2.0%
5 136271
 
1.4%
4 135309
 
1.4%
6 101047
 
1.0%
9 85433
 
0.9%
7 79809
 
0.8%
8 78033
 
0.8%
Dash Punctuation
ValueCountFrequency (%)
- 1394234
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1394234
100.0%
Uppercase Letter
ValueCountFrequency (%)
T 697117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12548106
94.7%
Latin 697117
 
5.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5937797
47.3%
2 1960001
 
15.6%
- 1394234
 
11.1%
: 1394234
 
11.1%
1 1053427
 
8.4%
3 192511
 
1.5%
5 136271
 
1.1%
4 135309
 
1.1%
6 101047
 
0.8%
9 85433
 
0.7%
Other values (2) 157842
 
1.3%
Latin
ValueCountFrequency (%)
T 697117
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13245223
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5937797
44.8%
2 1960001
 
14.8%
- 1394234
 
10.5%
: 1394234
 
10.5%
1 1053427
 
8.0%
T 697117
 
5.3%
3 192511
 
1.5%
5 136271
 
1.0%
4 135309
 
1.0%
6 101047
 
0.8%
Other values (3) 243275
 
1.8%

HasAllergy
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size680.9 KiB
False
694490 
True
 
2627
ValueCountFrequency (%)
False 694490
99.6%
True 2627
 
0.4%
2023-06-13T12:56:20.694754image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

HasDisability
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing346468
Missing (%)49.7%
Memory size21.3 MiB
False
349650 
True
 
999
(Missing)
346468 
ValueCountFrequency (%)
False 349650
50.2%
True 999
 
0.1%
(Missing) 346468
49.7%
2023-06-13T12:56:21.009561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size680.9 KiB
False
565323 
True
131794 
ValueCountFrequency (%)
False 565323
81.1%
True 131794
 
18.9%
2023-06-13T12:56:21.077516image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size680.9 KiB
False
550338 
True
146779 
ValueCountFrequency (%)
False 550338
78.9%
True 146779
 
21.1%
2023-06-13T12:56:21.144764image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size680.9 KiB
False
378695 
True
318422 
ValueCountFrequency (%)
False 378695
54.3%
True 318422
45.7%
2023-06-13T12:56:21.219087image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size680.9 KiB
True
599622 
False
97495 
ValueCountFrequency (%)
True 599622
86.0%
False 97495
 
14.0%
2023-06-13T12:56:21.294638image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing31968
Missing (%)4.6%
Memory size40.3 MiB
Female
337033 
Male
328116 

Length

Max length6
Median length6
Mean length5.013406
Min length4

Characters and Unicode

Total characters3334662
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Female 337033
48.3%
Male 328116
47.1%
(Missing) 31968
 
4.6%

Length

2023-06-13T12:56:21.369778image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:21.451633image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
female 337033
50.7%
male 328116
49.3%

Most occurring characters

ValueCountFrequency (%)
e 1002182
30.1%
a 665149
19.9%
l 665149
19.9%
F 337033
 
10.1%
m 337033
 
10.1%
M 328116
 
9.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2669513
80.1%
Uppercase Letter 665149
 
19.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1002182
37.5%
a 665149
24.9%
l 665149
24.9%
m 337033
 
12.6%
Uppercase Letter
ValueCountFrequency (%)
F 337033
50.7%
M 328116
49.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 3334662
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1002182
30.1%
a 665149
19.9%
l 665149
19.9%
F 337033
 
10.1%
m 337033
 
10.1%
M 328116
 
9.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3334662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1002182
30.1%
a 665149
19.9%
l 665149
19.9%
F 337033
 
10.1%
m 337033
 
10.1%
M 328116
 
9.8%

EthnicGroup
Categorical

IMBALANCE  MISSING 

Distinct5
Distinct (%)< 0.1%
Missing211714
Missing (%)30.4%
Memory size36.1 MiB
African
449994 
Coloured
 
33263
White
 
1369
Other
 
666
Indian
 
111

Length

Max length8
Median length7
Mean length7.0599131
Min length5

Characters and Unicode

Total characters3426903
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAfrican
2nd rowAfrican
3rd rowAfrican
4th rowAfrican
5th rowColoured

Common Values

ValueCountFrequency (%)
African 449994
64.6%
Coloured 33263
 
4.8%
White 1369
 
0.2%
Other 666
 
0.1%
Indian 111
 
< 0.1%
(Missing) 211714
30.4%

Length

2023-06-13T12:56:21.539269image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:21.624221image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
african 449994
92.7%
coloured 33263
 
6.9%
white 1369
 
0.3%
other 666
 
0.1%
indian 111
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r 483923
14.1%
i 451474
13.2%
n 450216
13.1%
a 450105
13.1%
A 449994
13.1%
c 449994
13.1%
f 449994
13.1%
o 66526
 
1.9%
e 35298
 
1.0%
d 33374
 
1.0%
Other values (8) 106005
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2941500
85.8%
Uppercase Letter 485403
 
14.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 483923
16.5%
i 451474
15.3%
n 450216
15.3%
a 450105
15.3%
c 449994
15.3%
f 449994
15.3%
o 66526
 
2.3%
e 35298
 
1.2%
d 33374
 
1.1%
l 33263
 
1.1%
Other values (3) 37333
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
A 449994
92.7%
C 33263
 
6.9%
W 1369
 
0.3%
O 666
 
0.1%
I 111
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 3426903
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 483923
14.1%
i 451474
13.2%
n 450216
13.1%
a 450105
13.1%
A 449994
13.1%
c 449994
13.1%
f 449994
13.1%
o 66526
 
1.9%
e 35298
 
1.0%
d 33374
 
1.0%
Other values (8) 106005
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3426903
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 483923
14.1%
i 451474
13.2%
n 450216
13.1%
a 450105
13.1%
A 449994
13.1%
c 449994
13.1%
f 449994
13.1%
o 66526
 
1.9%
e 35298
 
1.0%
d 33374
 
1.0%
Other values (8) 106005
 
3.1%

HomeLanguage
Categorical

Distinct11
Distinct (%)< 0.1%
Missing226033
Missing (%)32.4%
Memory size35.9 MiB
isiXhosa
144041 
isiZulu
124320 
Setswana
55685 
Sepedi
41144 
Afrikaans
31598 
Other values (6)
74296 

Length

Max length10
Median length9
Mean length7.5932297
Min length6

Characters and Unicode

Total characters3577049
Distinct characters26
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowisiXhosa
2nd rowSetswana
3rd rowAfrikaans
4th rowAfrikaans
5th rowAfrikaans

Common Values

ValueCountFrequency (%)
isiXhosa 144041
20.7%
isiZulu 124320
17.8%
Setswana 55685
 
8.0%
Sepedi 41144
 
5.9%
Afrikaans 31598
 
4.5%
Sesotho 28194
 
4.0%
Xitsonga 14430
 
2.1%
English 9213
 
1.3%
Tshivenda 8917
 
1.3%
isiNdebele 8473
 
1.2%
(Missing) 226033
32.4%

Length

2023-06-13T12:56:21.702013image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
isixhosa 144041
30.6%
isizulu 124320
26.4%
setswana 55685
 
11.8%
sepedi 41144
 
8.7%
afrikaans 31598
 
6.7%
sesotho 28194
 
6.0%
xitsonga 14430
 
3.1%
english 9213
 
2.0%
tshivenda 8917
 
1.9%
isindebele 8473
 
1.8%

Most occurring characters

ValueCountFrequency (%)
i 669108
18.7%
s 573981
16.0%
a 347023
9.7%
u 248640
 
7.0%
o 214859
 
6.0%
e 200503
 
5.6%
h 190365
 
5.3%
X 158471
 
4.4%
l 142006
 
4.0%
S 130092
 
3.6%
Other values (16) 702001
19.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3105965
86.8%
Uppercase Letter 471084
 
13.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 669108
21.5%
s 573981
18.5%
a 347023
11.2%
u 248640
 
8.0%
o 214859
 
6.9%
e 200503
 
6.5%
h 190365
 
6.1%
l 142006
 
4.6%
n 119843
 
3.9%
t 103378
 
3.3%
Other values (9) 296259
9.5%
Uppercase Letter
ValueCountFrequency (%)
X 158471
33.6%
S 130092
27.6%
Z 124320
26.4%
A 31598
 
6.7%
E 9213
 
2.0%
T 8917
 
1.9%
N 8473
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 3577049
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 669108
18.7%
s 573981
16.0%
a 347023
9.7%
u 248640
 
7.0%
o 214859
 
6.0%
e 200503
 
5.6%
h 190365
 
5.3%
X 158471
 
4.4%
l 142006
 
4.0%
S 130092
 
3.6%
Other values (16) 702001
19.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3577049
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 669108
18.7%
s 573981
16.0%
a 347023
9.7%
u 248640
 
7.0%
o 214859
 
6.0%
e 200503
 
5.6%
h 190365
 
5.3%
X 158471
 
4.4%
l 142006
 
4.0%
S 130092
 
3.6%
Other values (16) 702001
19.6%

GrantType
Categorical

IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing11655
Missing (%)1.7%
Memory size44.3 MiB
Child Grant
604654 
None
79402 
Disability Grant
 
1406

Length

Max length16
Median length11
Mean length10.199395
Min length4

Characters and Unicode

Total characters6991298
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowChild Grant
2nd rowChild Grant
3rd rowChild Grant
4th rowChild Grant
5th rowChild Grant

Common Values

ValueCountFrequency (%)
Child Grant 604654
86.7%
None 79402
 
11.4%
Disability Grant 1406
 
0.2%
(Missing) 11655
 
1.7%

Length

2023-06-13T12:56:21.778754image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:21.868189image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
grant 606060
46.9%
child 604654
46.8%
none 79402
 
6.1%
disability 1406
 
0.1%

Most occurring characters

ValueCountFrequency (%)
n 685462
9.8%
i 608872
8.7%
t 607466
8.7%
a 607466
8.7%
l 606060
8.7%
606060
8.7%
G 606060
8.7%
r 606060
8.7%
h 604654
8.6%
C 604654
8.6%
Other values (8) 848484
12.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5093716
72.9%
Uppercase Letter 1291522
 
18.5%
Space Separator 606060
 
8.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 685462
13.5%
i 608872
12.0%
t 607466
11.9%
a 607466
11.9%
l 606060
11.9%
r 606060
11.9%
h 604654
11.9%
d 604654
11.9%
o 79402
 
1.6%
e 79402
 
1.6%
Other values (3) 4218
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
G 606060
46.9%
C 604654
46.8%
N 79402
 
6.1%
D 1406
 
0.1%
Space Separator
ValueCountFrequency (%)
606060
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6385238
91.3%
Common 606060
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 685462
10.7%
i 608872
9.5%
t 607466
9.5%
a 607466
9.5%
l 606060
9.5%
G 606060
9.5%
r 606060
9.5%
h 604654
9.5%
C 604654
9.5%
d 604654
9.5%
Other values (7) 243830
 
3.8%
Common
ValueCountFrequency (%)
606060
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6991298
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 685462
9.8%
i 608872
8.7%
t 607466
8.7%
a 607466
8.7%
l 606060
8.7%
606060
8.7%
G 606060
8.7%
r 606060
8.7%
h 604654
8.6%
C 604654
8.6%
Other values (8) 848484
12.1%

PlaygroupGroup
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.1 MiB
None
522440 
Group A
126799 
Group B
 
47878

Length

Max length7
Median length4
Mean length4.7517117
Min length4

Characters and Unicode

Total characters3312499
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGroup A
2nd rowNone
3rd rowNone
4th rowGroup A
5th rowGroup B

Common Values

ValueCountFrequency (%)
None 522440
74.9%
Group A 126799
 
18.2%
Group B 47878
 
6.9%

Length

2023-06-13T12:56:21.956652image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:22.053676image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
none 522440
59.9%
group 174677
 
20.0%
a 126799
 
14.5%
b 47878
 
5.5%

Most occurring characters

ValueCountFrequency (%)
o 697117
21.0%
N 522440
15.8%
n 522440
15.8%
e 522440
15.8%
G 174677
 
5.3%
r 174677
 
5.3%
u 174677
 
5.3%
p 174677
 
5.3%
174677
 
5.3%
A 126799
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2266028
68.4%
Uppercase Letter 871794
 
26.3%
Space Separator 174677
 
5.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 697117
30.8%
n 522440
23.1%
e 522440
23.1%
r 174677
 
7.7%
u 174677
 
7.7%
p 174677
 
7.7%
Uppercase Letter
ValueCountFrequency (%)
N 522440
59.9%
G 174677
 
20.0%
A 126799
 
14.5%
B 47878
 
5.5%
Space Separator
ValueCountFrequency (%)
174677
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3137822
94.7%
Common 174677
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 697117
22.2%
N 522440
16.6%
n 522440
16.6%
e 522440
16.6%
G 174677
 
5.6%
r 174677
 
5.6%
u 174677
 
5.6%
p 174677
 
5.6%
A 126799
 
4.0%
B 47878
 
1.5%
Common
ValueCountFrequency (%)
174677
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3312499
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 697117
21.0%
N 522440
15.8%
n 522440
15.8%
e 522440
15.8%
G 174677
 
5.3%
r 174677
 
5.3%
u 174677
 
5.3%
p 174677
 
5.3%
174677
 
5.3%
A 126799
 
3.8%

InactiveReason
Categorical

IMBALANCE  MISSING 

Distinct5
Distinct (%)< 0.1%
Missing647833
Missing (%)92.9%
Memory size23.8 MiB
Franchisee left the programme
46768 
My child is starting Grade R
 
925
We are moving to a different area
 
740
Other
 
666
My child is starting Grade 1
 
185

Length

Max length33
Median length29
Mean length28.713213
Min length5

Characters and Unicode

Total characters1415102
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFranchisee left the programme
2nd rowFranchisee left the programme
3rd rowFranchisee left the programme
4th rowFranchisee left the programme
5th rowFranchisee left the programme

Common Values

ValueCountFrequency (%)
Franchisee left the programme 46768
 
6.7%
My child is starting Grade R 925
 
0.1%
We are moving to a different area 740
 
0.1%
Other 666
 
0.1%
My child is starting Grade 1 185
 
< 0.1%
(Missing) 647833
92.9%

Length

2023-06-13T12:56:22.126960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:22.216051image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
franchisee 46768
23.4%
left 46768
23.4%
the 46768
23.4%
programme 46768
23.4%
my 1110
 
0.6%
child 1110
 
0.6%
is 1110
 
0.6%
starting 1110
 
0.6%
grade 1110
 
0.6%
r 925
 
0.5%
Other values (9) 6031
 
3.0%

Most occurring characters

ValueCountFrequency (%)
e 239316
16.9%
150294
10.6%
r 145410
10.3%
a 98716
 
7.0%
t 97902
 
6.9%
h 95312
 
6.7%
m 94276
 
6.7%
i 51578
 
3.6%
n 49358
 
3.5%
s 48988
 
3.5%
Other values (16) 343952
24.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1213304
85.7%
Space Separator 150294
 
10.6%
Uppercase Letter 51319
 
3.6%
Decimal Number 185
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 239316
19.7%
r 145410
12.0%
a 98716
8.1%
t 97902
8.1%
h 95312
 
7.9%
m 94276
 
7.8%
i 51578
 
4.3%
n 49358
 
4.1%
s 48988
 
4.0%
g 48618
 
4.0%
Other values (8) 243830
20.1%
Uppercase Letter
ValueCountFrequency (%)
F 46768
91.1%
M 1110
 
2.2%
G 1110
 
2.2%
R 925
 
1.8%
W 740
 
1.4%
O 666
 
1.3%
Space Separator
ValueCountFrequency (%)
150294
100.0%
Decimal Number
ValueCountFrequency (%)
1 185
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1264623
89.4%
Common 150479
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 239316
18.9%
r 145410
11.5%
a 98716
 
7.8%
t 97902
 
7.7%
h 95312
 
7.5%
m 94276
 
7.5%
i 51578
 
4.1%
n 49358
 
3.9%
s 48988
 
3.9%
g 48618
 
3.8%
Other values (14) 295149
23.3%
Common
ValueCountFrequency (%)
150294
99.9%
1 185
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1415102
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 239316
16.9%
150294
10.6%
r 145410
10.3%
a 98716
 
7.0%
t 97902
 
6.9%
h 95312
 
6.7%
m 94276
 
6.7%
i 51578
 
3.6%
n 49358
 
3.5%
s 48988
 
3.5%
Other values (16) 343952
24.3%

Status
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.9 MiB
Active
697117 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters4182702
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowActive
2nd rowActive
3rd rowActive
4th rowActive
5th rowActive

Common Values

ValueCountFrequency (%)
Active 697117
100.0%

Length

2023-06-13T12:56:22.302699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:22.380853image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
active 697117
100.0%

Most occurring characters

ValueCountFrequency (%)
A 697117
16.7%
c 697117
16.7%
t 697117
16.7%
i 697117
16.7%
v 697117
16.7%
e 697117
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3485585
83.3%
Uppercase Letter 697117
 
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 697117
20.0%
t 697117
20.0%
i 697117
20.0%
v 697117
20.0%
e 697117
20.0%
Uppercase Letter
ValueCountFrequency (%)
A 697117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4182702
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 697117
16.7%
c 697117
16.7%
t 697117
16.7%
i 697117
16.7%
v 697117
16.7%
e 697117
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4182702
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 697117
16.7%
c 697117
16.7%
t 697117
16.7%
i 697117
16.7%
v 697117
16.7%
e 697117
16.7%

Franchisee.Guid
Categorical

Distinct3639
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size61.8 MiB
9f6bbb81-c544-e911-828d-0800274bb0e4
 
2220
ef6e021c-2a79-ea11-833b-00155d326100
 
1850
e292d672-aa56-e811-817a-0800274bb0e4
 
1406
ade15bf6-4f4f-e711-80e2-005056815442
 
1369
89f720fc-edcc-eb11-8349-00155d326100
 
1221
Other values (3634)
689051 

Length

Max length36
Median length36
Mean length36
Min length36

Characters and Unicode

Total characters25096212
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1e84c406-3deb-e911-8325-0800274bb0e4
2nd row68814817-0705-ea11-8329-0800274bb0e4
3rd row1e84c406-3deb-e911-8325-0800274bb0e4
4th row1e84c406-3deb-e911-8325-0800274bb0e4
5th row1e84c406-3deb-e911-8325-0800274bb0e4

Common Values

ValueCountFrequency (%)
9f6bbb81-c544-e911-828d-0800274bb0e4 2220
 
0.3%
ef6e021c-2a79-ea11-833b-00155d326100 1850
 
0.3%
e292d672-aa56-e811-817a-0800274bb0e4 1406
 
0.2%
ade15bf6-4f4f-e711-80e2-005056815442 1369
 
0.2%
89f720fc-edcc-eb11-8349-00155d326100 1221
 
0.2%
006e23ea-36e3-e811-819a-0800274bb0e4 1221
 
0.2%
b87e51e8-d7d1-e811-8187-0800274bb0e4 1184
 
0.2%
90b93781-4704-ea11-8329-0800274bb0e4 1147
 
0.2%
708b55e3-1923-eb11-8345-00155d326100 1147
 
0.2%
03aabb7a-b565-ea11-833b-00155d326100 1147
 
0.2%
Other values (3629) 683205
98.0%

Length

2023-06-13T12:56:22.442518image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
9f6bbb81-c544-e911-828d-0800274bb0e4 2220
 
0.3%
ef6e021c-2a79-ea11-833b-00155d326100 1850
 
0.3%
e292d672-aa56-e811-817a-0800274bb0e4 1406
 
0.2%
ade15bf6-4f4f-e711-80e2-005056815442 1369
 
0.2%
89f720fc-edcc-eb11-8349-00155d326100 1221
 
0.2%
006e23ea-36e3-e811-819a-0800274bb0e4 1221
 
0.2%
b87e51e8-d7d1-e811-8187-0800274bb0e4 1184
 
0.2%
90b93781-4704-ea11-8329-0800274bb0e4 1147
 
0.2%
708b55e3-1923-eb11-8345-00155d326100 1147
 
0.2%
03aabb7a-b565-ea11-833b-00155d326100 1147
 
0.2%
Other values (3629) 683205
98.0%

Most occurring characters

ValueCountFrequency (%)
0 3343320
13.3%
1 2824802
11.3%
- 2788468
11.1%
8 1801789
 
7.2%
5 1552446
 
6.2%
4 1540458
 
6.1%
e 1532725
 
6.1%
2 1359935
 
5.4%
3 1337402
 
5.3%
b 1275908
 
5.1%
Other values (7) 5738959
22.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16562865
66.0%
Lowercase Letter 5744879
 
22.9%
Dash Punctuation 2788468
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3343320
20.2%
1 2824802
17.1%
8 1801789
10.9%
5 1552446
9.4%
4 1540458
9.3%
2 1359935
8.2%
3 1337402
8.1%
6 1065452
 
6.4%
7 958115
 
5.8%
9 779146
 
4.7%
Lowercase Letter
ValueCountFrequency (%)
e 1532725
26.7%
b 1275908
22.2%
d 994819
17.3%
c 726865
12.7%
a 671143
11.7%
f 543419
 
9.5%
Dash Punctuation
ValueCountFrequency (%)
- 2788468
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19351333
77.1%
Latin 5744879
 
22.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3343320
17.3%
1 2824802
14.6%
- 2788468
14.4%
8 1801789
9.3%
5 1552446
8.0%
4 1540458
8.0%
2 1359935
7.0%
3 1337402
6.9%
6 1065452
 
5.5%
7 958115
 
5.0%
Latin
ValueCountFrequency (%)
e 1532725
26.7%
b 1275908
22.2%
d 994819
17.3%
c 726865
12.7%
a 671143
11.7%
f 543419
 
9.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25096212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3343320
13.3%
1 2824802
11.3%
- 2788468
11.1%
8 1801789
 
7.2%
5 1552446
 
6.2%
4 1540458
 
6.1%
e 1532725
 
6.1%
2 1359935
 
5.4%
3 1337402
 
5.3%
b 1275908
 
5.1%
Other values (7) 5738959
22.9%
Distinct17723
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size65.7 MiB
12 12
 
555
Jabulile Gamede
 
444
Mukelisiwe  Cele
 
370
1 1
 
370
Faith Mofulwane
 
333
Other values (17718)
695045 

Length

Max length55
Median length41
Mean length17.792049
Min length4

Characters and Unicode

Total characters12403140
Distinct characters79
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowShieda Komani
2nd rowMpho Ramohlabi
3rd rowRagel De Bruin
4th rowLeatitia Zona
5th rowEsmeralda Klaaste

Common Values

ValueCountFrequency (%)
12 12 555
 
0.1%
Jabulile Gamede 444
 
0.1%
Mukelisiwe  Cele 370
 
0.1%
1 1 370
 
0.1%
Faith Mofulwane 333
 
< 0.1%
Mavis Macala 296
 
< 0.1%
Mandisa  Kheswa 259
 
< 0.1%
Vinolia Phiri 185
 
< 0.1%
Griet  Olifant 185
 
< 0.1%
n/a n/a 148
 
< 0.1%
Other values (17713) 693972
99.5%

Length

2023-06-13T12:56:22.539172image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dlamini 7326
 
0.5%
maria 6179
 
0.4%
ndlovu 5698
 
0.4%
sithole 5661
 
0.4%
ngubane 4736
 
0.3%
mkhize 4477
 
0.3%
lerato 4329
 
0.3%
zanele 4218
 
0.3%
khumalo 4144
 
0.3%
mahlangu 3959
 
0.3%
Other values (15180) 1476448
96.7%

Most occurring characters

ValueCountFrequency (%)
a 1356013
 
10.9%
e 1030117
 
8.3%
1020053
 
8.2%
i 867206
 
7.0%
o 789469
 
6.4%
n 698338
 
5.6%
  697117
 
5.6%
l 657786
 
5.3%
h 470825
 
3.8%
s 382765
 
3.1%
Other values (69) 4433451
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9088421
73.3%
Space Separator 1717170
 
13.8%
Uppercase Letter 1562399
 
12.6%
Decimal Number 29563
 
0.2%
Dash Punctuation 3441
 
< 0.1%
Other Punctuation 1998
 
< 0.1%
Open Punctuation 74
 
< 0.1%
Close Punctuation 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1356013
14.9%
e 1030117
11.3%
i 867206
9.5%
o 789469
 
8.7%
n 698338
 
7.7%
l 657786
 
7.2%
h 470825
 
5.2%
s 382765
 
4.2%
u 356606
 
3.9%
t 355163
 
3.9%
Other values (21) 2124133
23.4%
Uppercase Letter
ValueCountFrequency (%)
M 355977
22.8%
N 223110
14.3%
S 140045
 
9.0%
T 87505
 
5.6%
K 73630
 
4.7%
L 72187
 
4.6%
P 67636
 
4.3%
B 67340
 
4.3%
A 61790
 
4.0%
D 55278
 
3.5%
Other values (17) 357901
22.9%
Decimal Number
ValueCountFrequency (%)
0 6549
22.2%
1 5180
17.5%
8 3959
13.4%
2 3737
12.6%
7 2220
 
7.5%
9 2035
 
6.9%
6 1591
 
5.4%
5 1517
 
5.1%
3 1406
 
4.8%
4 1369
 
4.6%
Other Punctuation
ValueCountFrequency (%)
. 962
48.1%
' 370
 
18.5%
/ 333
 
16.7%
, 148
 
7.4%
? 148
 
7.4%
& 37
 
1.9%
Space Separator
ValueCountFrequency (%)
1020053
59.4%
  697117
40.6%
Dash Punctuation
ValueCountFrequency (%)
- 3441
100.0%
Open Punctuation
ValueCountFrequency (%)
( 74
100.0%
Close Punctuation
ValueCountFrequency (%)
) 74
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10650820
85.9%
Common 1752320
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1356013
 
12.7%
e 1030117
 
9.7%
i 867206
 
8.1%
o 789469
 
7.4%
n 698338
 
6.6%
l 657786
 
6.2%
h 470825
 
4.4%
s 382765
 
3.6%
u 356606
 
3.3%
M 355977
 
3.3%
Other values (48) 3685718
34.6%
Common
ValueCountFrequency (%)
1020053
58.2%
  697117
39.8%
0 6549
 
0.4%
1 5180
 
0.3%
8 3959
 
0.2%
2 3737
 
0.2%
- 3441
 
0.2%
7 2220
 
0.1%
9 2035
 
0.1%
6 1591
 
0.1%
Other values (11) 6438
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11705801
94.4%
None 697339
 
5.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1356013
 
11.6%
e 1030117
 
8.8%
1020053
 
8.7%
i 867206
 
7.4%
o 789469
 
6.7%
n 698338
 
6.0%
l 657786
 
5.6%
h 470825
 
4.0%
s 382765
 
3.3%
u 356606
 
3.0%
Other values (62) 4076623
34.8%
None
ValueCountFrequency (%)
  697117
> 99.9%
ź 37
 
< 0.1%
é 37
 
< 0.1%
ĺ 37
 
< 0.1%
ë 37
 
< 0.1%
Á 37
 
< 0.1%
ñ 37
 
< 0.1%
Distinct9230
Distinct (%)1.3%
Missing185
Missing (%)< 0.1%
Memory size43.9 MiB
Maria
 
2664
Lerato
 
2553
Zanele
 
2294
Mpho
 
2294
Nthabiseng
 
2294
Other values (9225)
684833 

Length

Max length47
Median length33
Mean length9.0123699
Min length1

Characters and Unicode

Total characters6281009
Distinct characters75
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowShieda
2nd rowMpho
3rd rowRagel
4th rowLeatitia
5th rowEsmeralda

Common Values

ValueCountFrequency (%)
Maria 2664
 
0.4%
Lerato 2553
 
0.4%
Zanele 2294
 
0.3%
Mpho 2294
 
0.3%
Nthabiseng 2294
 
0.3%
Zandile 2109
 
0.3%
Nokuthula 1961
 
0.3%
Amanda 1961
 
0.3%
Siphokazi 1961
 
0.3%
Andiswa 1813
 
0.3%
Other values (9220) 675028
96.8%

Length

2023-06-13T12:56:22.656678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
maria 6142
 
0.7%
lerato 4329
 
0.5%
zanele 4218
 
0.5%
nthabiseng 3848
 
0.5%
mpho 3700
 
0.4%
thandeka 3700
 
0.4%
zandile 3515
 
0.4%
nokuthula 3256
 
0.4%
portia 3145
 
0.4%
nonhlanhla 3071
 
0.4%
Other values (7005) 784918
95.3%

Most occurring characters

ValueCountFrequency (%)
a 660265
 
10.5%
e 619824
 
9.9%
i 568320
 
9.0%
o 447219
 
7.1%
n 410885
 
6.5%
l 387242
 
6.2%
h 275428
 
4.4%
264550
 
4.2%
s 230547
 
3.7%
t 205905
 
3.3%
Other values (65) 2210824
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5157134
82.1%
Uppercase Letter 839123
 
13.4%
Space Separator 264550
 
4.2%
Decimal Number 16021
 
0.3%
Dash Punctuation 2664
 
< 0.1%
Other Punctuation 1443
 
< 0.1%
Open Punctuation 37
 
< 0.1%
Close Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 660265
12.8%
e 619824
12.0%
i 568320
11.0%
o 447219
 
8.7%
n 410885
 
8.0%
l 387242
 
7.5%
h 275428
 
5.3%
s 230547
 
4.5%
t 205905
 
4.0%
u 182077
 
3.5%
Other values (19) 1169422
22.7%
Uppercase Letter
ValueCountFrequency (%)
N 139342
16.6%
M 96607
11.5%
S 75110
 
9.0%
T 55907
 
6.7%
L 51541
 
6.1%
A 49580
 
5.9%
P 49580
 
5.9%
B 42328
 
5.0%
K 36075
 
4.3%
Z 30525
 
3.6%
Other values (17) 212528
25.3%
Decimal Number
ValueCountFrequency (%)
0 3589
22.4%
1 2516
15.7%
8 2072
12.9%
2 2035
12.7%
7 1406
 
8.8%
6 999
 
6.2%
9 962
 
6.0%
5 888
 
5.5%
3 814
 
5.1%
4 740
 
4.6%
Other Punctuation
ValueCountFrequency (%)
. 814
56.4%
' 296
 
20.5%
, 148
 
10.3%
? 148
 
10.3%
& 37
 
2.6%
Space Separator
ValueCountFrequency (%)
264550
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2664
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5996257
95.5%
Common 284752
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 660265
 
11.0%
e 619824
 
10.3%
i 568320
 
9.5%
o 447219
 
7.5%
n 410885
 
6.9%
l 387242
 
6.5%
h 275428
 
4.6%
s 230547
 
3.8%
t 205905
 
3.4%
u 182077
 
3.0%
Other values (46) 2008545
33.5%
Common
ValueCountFrequency (%)
264550
92.9%
0 3589
 
1.3%
- 2664
 
0.9%
1 2516
 
0.9%
8 2072
 
0.7%
2 2035
 
0.7%
7 1406
 
0.5%
6 999
 
0.4%
9 962
 
0.3%
5 888
 
0.3%
Other values (9) 3071
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6280861
> 99.9%
None 148
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 660265
 
10.5%
e 619824
 
9.9%
i 568320
 
9.0%
o 447219
 
7.1%
n 410885
 
6.5%
l 387242
 
6.2%
h 275428
 
4.4%
264550
 
4.2%
s 230547
 
3.7%
t 205905
 
3.3%
Other values (61) 2210676
35.2%
None
ValueCountFrequency (%)
ĺ 37
25.0%
Á 37
25.0%
é 37
25.0%
ź 37
25.0%
Distinct9578
Distinct (%)1.4%
Missing185
Missing (%)< 0.1%
Memory size42.4 MiB
Dlamini
 
6401
Ndlovu
 
4329
Sithole
 
3700
Mahlangu
 
3367
Khumalo
 
3145
Other values (9573)
675990 

Length

Max length23
Median length20
Mean length6.782491
Min length1

Characters and Unicode

Total characters4726935
Distinct characters71
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKomani
2nd rowRamohlabi
3rd rowDe Bruin
4th rowZona
5th rowKlaaste

Common Values

ValueCountFrequency (%)
Dlamini 6401
 
0.9%
Ndlovu 4329
 
0.6%
Sithole 3700
 
0.5%
Mahlangu 3367
 
0.5%
Khumalo 3145
 
0.5%
Mkhize 2849
 
0.4%
Ngubane 2553
 
0.4%
Mokoena 2368
 
0.3%
Zulu 2294
 
0.3%
Mbatha 2109
 
0.3%
Other values (9568) 663817
95.2%

Length

2023-06-13T12:56:22.777362image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dlamini 6845
 
1.0%
sithole 5439
 
0.8%
ndlovu 5439
 
0.8%
ngubane 4699
 
0.7%
mkhize 4403
 
0.6%
khumalo 3959
 
0.6%
mahlangu 3922
 
0.6%
mbatha 3145
 
0.4%
zulu 2812
 
0.4%
dladla 2775
 
0.4%
Other values (8956) 659562
93.8%

Most occurring characters

ValueCountFrequency (%)
a 695452
14.7%
e 410293
 
8.7%
o 342250
 
7.2%
i 298886
 
6.3%
n 287157
 
6.1%
l 270544
 
5.7%
M 259370
 
5.5%
h 195397
 
4.1%
u 174529
 
3.7%
s 152218
 
3.2%
Other values (61) 1640839
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3930695
83.2%
Uppercase Letter 723202
 
15.3%
Space Separator 58386
 
1.2%
Decimal Number 13542
 
0.3%
Dash Punctuation 777
 
< 0.1%
Other Punctuation 259
 
< 0.1%
Open Punctuation 37
 
< 0.1%
Close Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 695452
17.7%
e 410293
10.4%
o 342250
 
8.7%
i 298886
 
7.6%
n 287157
 
7.3%
l 270544
 
6.9%
h 195397
 
5.0%
u 174529
 
4.4%
s 152218
 
3.9%
t 149258
 
3.8%
Other values (18) 954711
24.3%
Uppercase Letter
ValueCountFrequency (%)
M 259370
35.9%
N 83731
 
11.6%
S 64935
 
9.0%
K 37555
 
5.2%
T 31598
 
4.4%
D 31450
 
4.3%
B 25012
 
3.5%
L 20646
 
2.9%
G 19721
 
2.7%
P 18056
 
2.5%
Other values (16) 131128
18.1%
Decimal Number
ValueCountFrequency (%)
0 2960
21.9%
1 2664
19.7%
8 1887
13.9%
2 1702
12.6%
9 1073
 
7.9%
7 814
 
6.0%
5 629
 
4.6%
4 629
 
4.6%
6 592
 
4.4%
3 592
 
4.4%
Other Punctuation
ValueCountFrequency (%)
. 148
57.1%
' 74
28.6%
/ 37
 
14.3%
Space Separator
ValueCountFrequency (%)
58386
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 777
100.0%
Open Punctuation
ValueCountFrequency (%)
( 37
100.0%
Close Punctuation
ValueCountFrequency (%)
) 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4653897
98.5%
Common 73038
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 695452
14.9%
e 410293
 
8.8%
o 342250
 
7.4%
i 298886
 
6.4%
n 287157
 
6.2%
l 270544
 
5.8%
M 259370
 
5.6%
h 195397
 
4.2%
u 174529
 
3.8%
s 152218
 
3.3%
Other values (44) 1567801
33.7%
Common
ValueCountFrequency (%)
58386
79.9%
0 2960
 
4.1%
1 2664
 
3.6%
8 1887
 
2.6%
2 1702
 
2.3%
9 1073
 
1.5%
7 814
 
1.1%
- 777
 
1.1%
5 629
 
0.9%
4 629
 
0.9%
Other values (7) 1517
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4726861
> 99.9%
None 74
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 695452
14.7%
e 410293
 
8.7%
o 342250
 
7.2%
i 298886
 
6.3%
n 287157
 
6.1%
l 270544
 
5.7%
M 259370
 
5.5%
h 195397
 
4.1%
u 174529
 
3.7%
s 152218
 
3.2%
Other values (59) 1640765
34.7%
None
ValueCountFrequency (%)
ë 37
50.0%
ñ 37
50.0%

Caregiver.IdNumber
Categorical

HIGH CARDINALITY  MISSING 

Distinct15284
Distinct (%)2.5%
Missing90058
Missing (%)12.9%
Memory size43.1 MiB
0000000000000
 
1221
12
 
555
0
 
481
000000000000
 
481
6702250614083
 
444
Other values (15279)
603877 

Length

Max length17
Median length13
Mean length12.784056
Min length1

Characters and Unicode

Total characters7760676
Distinct characters48
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9511070255085
2nd row6004300574080
3rd row8612290232085
4th row9801020149086
5th row0010220038086

Common Values

ValueCountFrequency (%)
0000000000000 1221
 
0.2%
12 555
 
0.1%
0 481
 
0.1%
000000000000 481
 
0.1%
6702250614083 444
 
0.1%
0000000000 407
 
0.1%
1 370
 
0.1%
0000000000012 370
 
0.1%
8207271099080 333
 
< 0.1%
8312221057087 259
 
< 0.1%
Other values (15274) 602138
86.4%
(Missing) 90058
 
12.9%

Length

2023-06-13T12:56:22.863603image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0000000000000 1221
 
0.2%
12 555
 
0.1%
0 481
 
0.1%
000000000000 481
 
0.1%
6702250614083 444
 
0.1%
0000000000 407
 
0.1%
1 370
 
0.1%
0000000000012 370
 
0.1%
none 333
 
0.1%
8207271099080 333
 
0.1%
Other values (15297) 603766
99.2%

Most occurring characters

ValueCountFrequency (%)
0 2082878
26.8%
8 1183482
15.2%
1 979020
12.6%
9 675287
 
8.7%
2 652236
 
8.4%
7 451437
 
5.8%
3 439486
 
5.7%
5 437969
 
5.6%
6 434306
 
5.6%
4 409960
 
5.3%
Other values (38) 14615
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7746061
99.8%
Uppercase Letter 8325
 
0.1%
Space Separator 1850
 
< 0.1%
Dash Punctuation 1591
 
< 0.1%
Lowercase Letter 1554
 
< 0.1%
Other Punctuation 1184
 
< 0.1%
Connector Punctuation 74
 
< 0.1%
Modifier Symbol 37
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1998
24.0%
A 1147
13.8%
C 962
11.6%
M 703
 
8.4%
R 518
 
6.2%
D 481
 
5.8%
B 407
 
4.9%
F 370
 
4.4%
E 333
 
4.0%
T 296
 
3.6%
Other values (11) 1110
13.3%
Decimal Number
ValueCountFrequency (%)
0 2082878
26.9%
8 1183482
15.3%
1 979020
12.6%
9 675287
 
8.7%
2 652236
 
8.4%
7 451437
 
5.8%
3 439486
 
5.7%
5 437969
 
5.7%
6 434306
 
5.6%
4 409960
 
5.3%
Lowercase Letter
ValueCountFrequency (%)
n 481
31.0%
e 370
23.8%
o 370
23.8%
a 111
 
7.1%
m 37
 
2.4%
w 37
 
2.4%
i 37
 
2.4%
s 37
 
2.4%
k 37
 
2.4%
r 37
 
2.4%
Other Punctuation
ValueCountFrequency (%)
/ 1110
93.8%
. 37
 
3.1%
* 37
 
3.1%
Space Separator
ValueCountFrequency (%)
1850
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1591
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 74
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7750797
99.9%
Latin 9879
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1998
20.2%
A 1147
11.6%
C 962
 
9.7%
M 703
 
7.1%
R 518
 
5.2%
n 481
 
4.9%
D 481
 
4.9%
B 407
 
4.1%
F 370
 
3.7%
e 370
 
3.7%
Other values (21) 2442
24.7%
Common
ValueCountFrequency (%)
0 2082878
26.9%
8 1183482
15.3%
1 979020
12.6%
9 675287
 
8.7%
2 652236
 
8.4%
7 451437
 
5.8%
3 439486
 
5.7%
5 437969
 
5.7%
6 434306
 
5.6%
4 409960
 
5.3%
Other values (7) 4736
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7760676
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2082878
26.8%
8 1183482
15.2%
1 979020
12.6%
9 675287
 
8.7%
2 652236
 
8.4%
7 451437
 
5.8%
3 439486
 
5.7%
5 437969
 
5.6%
6 434306
 
5.6%
4 409960
 
5.3%
Other values (38) 14615
 
0.2%

Caregiver.ContactNumber
Categorical

HIGH CARDINALITY  MISSING 

Distinct11312
Distinct (%)2.4%
Missing227587
Missing (%)32.6%
Memory size36.9 MiB
0
 
4329
0000000000
 
2701
0681145763
 
1850
None
 
1295
00000000
 
1147
Other values (11307)
458208 

Length

Max length22
Median length10
Mean length9.8550039
Min length1

Characters and Unicode

Total characters4627220
Distinct characters33
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0635118027
2nd row0780401410
3rd row0651332965
4th row0719602840
5th row0632598548

Common Values

ValueCountFrequency (%)
0 4329
 
0.6%
0000000000 2701
 
0.4%
0681145763 1850
 
0.3%
None 1295
 
0.2%
00000000 1147
 
0.2%
000000000 962
 
0.1%
0661469720 481
 
0.1%
0761475377 444
 
0.1%
0479392931 370
 
0.1%
+ 370
 
0.1%
Other values (11302) 455581
65.4%
(Missing) 227587
32.6%

Length

2023-06-13T12:56:22.949127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 4329
 
0.9%
0000000000 2701
 
0.6%
0681145763 1850
 
0.4%
none 1554
 
0.3%
00000000 1147
 
0.2%
000000000 962
 
0.2%
0661469720 481
 
0.1%
0761475377 444
 
0.1%
0479392931 370
 
0.1%
370
 
0.1%
Other values (11318) 456284
97.0%

Most occurring characters

ValueCountFrequency (%)
0 860583
18.6%
7 599067
12.9%
6 489362
10.6%
3 429163
9.3%
8 424945
9.2%
2 393939
8.5%
1 386724
8.4%
4 357642
7.7%
9 351648
7.6%
5 324823
 
7.0%
Other values (23) 9324
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4617896
99.8%
Lowercase Letter 5624
 
0.1%
Uppercase Letter 1813
 
< 0.1%
Space Separator 1184
 
< 0.1%
Math Symbol 370
 
< 0.1%
Other Punctuation 296
 
< 0.1%
Dash Punctuation 37
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1628
28.9%
n 1628
28.9%
o 1628
28.9%
a 185
 
3.3%
i 111
 
2.0%
u 111
 
2.0%
s 74
 
1.3%
q 74
 
1.3%
r 37
 
0.7%
b 37
 
0.7%
Other values (3) 111
 
2.0%
Decimal Number
ValueCountFrequency (%)
0 860583
18.6%
7 599067
13.0%
6 489362
10.6%
3 429163
9.3%
8 424945
9.2%
2 393939
8.5%
1 386724
8.4%
4 357642
7.7%
9 351648
7.6%
5 324823
 
7.0%
Uppercase Letter
ValueCountFrequency (%)
N 1591
87.8%
O 111
 
6.1%
H 37
 
2.0%
B 37
 
2.0%
Y 37
 
2.0%
Other Punctuation
ValueCountFrequency (%)
/ 259
87.5%
\ 37
 
12.5%
Space Separator
ValueCountFrequency (%)
1184
100.0%
Math Symbol
ValueCountFrequency (%)
+ 370
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4619783
99.8%
Latin 7437
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1628
21.9%
n 1628
21.9%
o 1628
21.9%
N 1591
21.4%
a 185
 
2.5%
O 111
 
1.5%
i 111
 
1.5%
u 111
 
1.5%
s 74
 
1.0%
q 74
 
1.0%
Other values (8) 296
 
4.0%
Common
ValueCountFrequency (%)
0 860583
18.6%
7 599067
13.0%
6 489362
10.6%
3 429163
9.3%
8 424945
9.2%
2 393939
8.5%
1 386724
8.4%
4 357642
7.7%
9 351648
7.6%
5 324823
 
7.0%
Other values (5) 1887
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4627220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 860583
18.6%
7 599067
12.9%
6 489362
10.6%
3 429163
9.3%
8 424945
9.2%
2 393939
8.5%
1 386724
8.4%
4 357642
7.7%
9 351648
7.6%
5 324823
 
7.0%
Other values (23) 9324
 
0.2%

Caregiver.RelationshipType
Categorical

IMBALANCE  MISSING 

Distinct4
Distinct (%)< 0.1%
Missing278832
Missing (%)40.0%
Memory size33.7 MiB
Mother
371036 
Guardian
 
26159
Father
 
11581
Grandparent
 
9509

Length

Max length11
Median length6
Mean length6.2387439
Min length6

Characters and Unicode

Total characters2609573
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMother
2nd rowMother
3rd rowMother
4th rowMother
5th rowMother

Common Values

ValueCountFrequency (%)
Mother 371036
53.2%
Guardian 26159
 
3.8%
Father 11581
 
1.7%
Grandparent 9509
 
1.4%
(Missing) 278832
40.0%

Length

2023-06-13T12:56:23.027467image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:23.120076image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
mother 371036
88.7%
guardian 26159
 
6.3%
father 11581
 
2.8%
grandparent 9509
 
2.3%

Most occurring characters

ValueCountFrequency (%)
r 427794
16.4%
t 392126
15.0%
e 392126
15.0%
h 382617
14.7%
M 371036
14.2%
o 371036
14.2%
a 82917
 
3.2%
n 45177
 
1.7%
G 35668
 
1.4%
d 35668
 
1.4%
Other values (4) 73408
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2191288
84.0%
Uppercase Letter 418285
 
16.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 427794
19.5%
t 392126
17.9%
e 392126
17.9%
h 382617
17.5%
o 371036
16.9%
a 82917
 
3.8%
n 45177
 
2.1%
d 35668
 
1.6%
u 26159
 
1.2%
i 26159
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
M 371036
88.7%
G 35668
 
8.5%
F 11581
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 2609573
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 427794
16.4%
t 392126
15.0%
e 392126
15.0%
h 382617
14.7%
M 371036
14.2%
o 371036
14.2%
a 82917
 
3.2%
n 45177
 
1.7%
G 35668
 
1.4%
d 35668
 
1.4%
Other values (4) 73408
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2609573
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 427794
16.4%
t 392126
15.0%
e 392126
15.0%
h 382617
14.7%
M 371036
14.2%
o 371036
14.2%
a 82917
 
3.2%
n 45177
 
1.7%
G 35668
 
1.4%
d 35668
 
1.4%
Other values (4) 73408
 
2.8%

Caregiver.HighestEducationLevel
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)< 0.1%
Missing519591
Missing (%)74.5%
Memory size27.0 MiB
No Matric
170940 
Diploma
 
2627
NQF Level 4 ECD
 
1850
Higher Certificate
 
1332
Bachelors
 
740

Length

Max length18
Median length9
Mean length9.1004585
Min length7

Characters and Unicode

Total characters1615568
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo Matric
2nd rowNo Matric
3rd rowNo Matric
4th rowNo Matric
5th rowNo Matric

Common Values

ValueCountFrequency (%)
No Matric 170940
 
24.5%
Diploma 2627
 
0.4%
NQF Level 4 ECD 1850
 
0.3%
Higher Certificate 1332
 
0.2%
Bachelors 740
 
0.1%
Doctorate 37
 
< 0.1%
(Missing) 519591
74.5%

Length

2023-06-13T12:56:23.209915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-13T12:56:23.326360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
no 170940
48.1%
matric 170940
48.1%
diploma 2627
 
0.7%
nqf 1850
 
0.5%
level 1850
 
0.5%
4 1850
 
0.5%
ecd 1850
 
0.5%
higher 1332
 
0.4%
certificate 1332
 
0.4%
bachelors 740
 
0.2%

Most occurring characters

ValueCountFrequency (%)
177822
11.0%
i 177563
11.0%
a 175676
10.9%
r 174381
10.8%
o 174381
10.8%
t 173678
10.8%
c 173049
10.7%
N 172790
10.7%
M 170940
10.6%
e 8473
 
0.5%
Other values (17) 36815
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1074998
66.5%
Uppercase Letter 360898
 
22.3%
Space Separator 177822
 
11.0%
Decimal Number 1850
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 177563
16.5%
a 175676
16.3%
r 174381
16.2%
o 174381
16.2%
t 173678
16.2%
c 173049
16.1%
e 8473
 
0.8%
l 5217
 
0.5%
m 2627
 
0.2%
p 2627
 
0.2%
Other values (5) 7326
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
N 172790
47.9%
M 170940
47.4%
D 4514
 
1.3%
C 3182
 
0.9%
E 1850
 
0.5%
Q 1850
 
0.5%
L 1850
 
0.5%
F 1850
 
0.5%
H 1332
 
0.4%
B 740
 
0.2%
Space Separator
ValueCountFrequency (%)
177822
100.0%
Decimal Number
ValueCountFrequency (%)
4 1850
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1435896
88.9%
Common 179672
 
11.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 177563
12.4%
a 175676
12.2%
r 174381
12.1%
o 174381
12.1%
t 173678
12.1%
c 173049
12.1%
N 172790
12.0%
M 170940
11.9%
e 8473
 
0.6%
l 5217
 
0.4%
Other values (15) 29748
 
2.1%
Common
ValueCountFrequency (%)
177822
99.0%
4 1850
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1615568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
177822
11.0%
i 177563
11.0%
a 175676
10.9%
r 174381
10.8%
o 174381
10.8%
t 173678
10.8%
c 173049
10.7%
N 172790
10.7%
M 170940
10.6%
e 8473
 
0.5%
Other values (17) 36815
 
2.3%
Distinct11
Distinct (%)0.1%
Missing676212
Missing (%)97.0%
Memory size21.9 MiB
isiZulu
7363 
isiXhosa
3515 
Setswana
3256 
Afrikaans
2183 
Sepedi
2035 
Other values (6)
2553 

Length

Max length10
Median length9
Mean length7.5026549
Min length6

Characters and Unicode

Total characters156843
Distinct characters26
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowisiXhosa
2nd rowAfrikaans
3rd rowAfrikaans
4th rowAfrikaans
5th rowAfrikaans

Common Values

ValueCountFrequency (%)
isiZulu 7363
 
1.1%
isiXhosa 3515
 
0.5%
Setswana 3256
 
0.5%
Afrikaans 2183
 
0.3%
Sepedi 2035
 
0.3%
siSwati 777
 
0.1%
Sesotho 666
 
0.1%
Xitsonga 481
 
0.1%
Tshivenda 296
 
< 0.1%
English 222
 
< 0.1%
(Missing) 676212
97.0%

Length

2023-06-13T12:56:23.430181image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
isizulu 7363
35.2%
isixhosa 3515
16.8%
setswana 3256
15.6%
afrikaans 2183
 
10.4%
sepedi 2035
 
9.7%
siswati 777
 
3.7%
sesotho 666
 
3.2%
xitsonga 481
 
2.3%
tshivenda 296
 
1.4%
english 222
 
1.1%

Most occurring characters

ValueCountFrequency (%)
i 28749
18.3%
s 22385
14.3%
a 15947
10.2%
u 14726
9.4%
e 8621
 
5.5%
l 7696
 
4.9%
Z 7363
 
4.7%
S 6734
 
4.3%
n 6438
 
4.1%
o 5328
 
3.4%
Other values (16) 32856
20.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 135938
86.7%
Uppercase Letter 20905
 
13.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 28749
21.1%
s 22385
16.5%
a 15947
11.7%
u 14726
10.8%
e 8621
 
6.3%
l 7696
 
5.7%
n 6438
 
4.7%
o 5328
 
3.9%
t 5180
 
3.8%
h 4699
 
3.5%
Other values (9) 16169
11.9%
Uppercase Letter
ValueCountFrequency (%)
Z 7363
35.2%
S 6734
32.2%
X 3996
19.1%
A 2183
 
10.4%
T 296
 
1.4%
E 222
 
1.1%
N 111
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 156843
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 28749
18.3%
s 22385
14.3%
a 15947
10.2%
u 14726
9.4%
e 8621
 
5.5%
l 7696
 
4.9%
Z 7363
 
4.7%
S 6734
 
4.3%
n 6438
 
4.1%
o 5328
 
3.4%
Other values (16) 32856
20.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 156843
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 28749
18.3%
s 22385
14.3%
a 15947
10.2%
u 14726
9.4%
e 8621
 
5.5%
l 7696
 
4.9%
Z 7363
 
4.7%
S 6734
 
4.3%
n 6438
 
4.1%
o 5328
 
3.4%
Other values (16) 32856
20.9%

Caregiver.Guid
Categorical

Distinct18103
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size61.8 MiB
69516fc1-1271-eb11-8345-00155d326100
 
555
87ab4728-bc13-ec11-834c-00155d326100
 
444
339b9917-12ab-ea11-833e-00155d326100
 
370
e2f323de-ca74-ea11-833b-00155d326100
 
370
2d1a67d8-0b95-ea11-833c-00155d326100
 
333
Other values (18098)
695045 

Length

Max length36
Median length36
Mean length36
Min length36

Characters and Unicode

Total characters25096212
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3b69f593-3d43-ea11-8330-080027a7109a
2nd row581e6f1a-bc45-ea11-833a-00155d326100
3rd rowf32af543-b745-ea11-833a-00155d326100
4th rowe99b7a2c-f945-ea11-833a-00155d326100
5th row1254a16f-4f46-ea11-833a-00155d326100

Common Values

ValueCountFrequency (%)
69516fc1-1271-eb11-8345-00155d326100 555
 
0.1%
87ab4728-bc13-ec11-834c-00155d326100 444
 
0.1%
339b9917-12ab-ea11-833e-00155d326100 370
 
0.1%
e2f323de-ca74-ea11-833b-00155d326100 370
 
0.1%
2d1a67d8-0b95-ea11-833c-00155d326100 333
 
< 0.1%
adf213d5-5e83-ec11-8350-00155d326100 259
 
< 0.1%
e82a1a5b-e8d4-eb11-8349-00155d326100 185
 
< 0.1%
a97e0cef-ba9a-eb11-8346-00155d326100 148
 
< 0.1%
642bd778-4d7f-eb11-8346-00155d326100 148
 
< 0.1%
512b9642-8d09-ec11-834c-00155d326100 148
 
< 0.1%
Other values (18093) 694157
99.6%

Length

2023-06-13T12:56:23.521992image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
69516fc1-1271-eb11-8345-00155d326100 555
 
0.1%
87ab4728-bc13-ec11-834c-00155d326100 444
 
0.1%
339b9917-12ab-ea11-833e-00155d326100 370
 
0.1%
e2f323de-ca74-ea11-833b-00155d326100 370
 
0.1%
2d1a67d8-0b95-ea11-833c-00155d326100 333
 
< 0.1%
adf213d5-5e83-ec11-8350-00155d326100 259
 
< 0.1%
e82a1a5b-e8d4-eb11-8349-00155d326100 185
 
< 0.1%
3f6bb4ba-ff78-ec11-834d-00155d326100 148
 
< 0.1%
2a29e3e1-6567-ea11-833b-00155d326100 148
 
< 0.1%
a5f94145-4e99-ec11-8351-00155d326100 148
 
< 0.1%
Other values (18093) 694157
99.6%

Most occurring characters

ValueCountFrequency (%)
0 3389718
13.5%
1 3334070
13.3%
- 2788468
11.1%
5 2124096
8.5%
3 1947939
 
7.8%
8 1330668
 
5.3%
6 1300957
 
5.2%
d 1280385
 
5.1%
e 1206866
 
4.8%
2 1204905
 
4.8%
Other values (7) 5188140
20.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16921469
67.4%
Lowercase Letter 5386275
 
21.5%
Dash Punctuation 2788468
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3389718
20.0%
1 3334070
19.7%
5 2124096
12.6%
3 1947939
11.5%
8 1330668
 
7.9%
6 1300957
 
7.7%
2 1204905
 
7.1%
4 987456
 
5.8%
9 711251
 
4.2%
7 590409
 
3.5%
Lowercase Letter
ValueCountFrequency (%)
d 1280385
23.8%
e 1206866
22.4%
b 883523
16.4%
c 856772
15.9%
a 656824
12.2%
f 501905
 
9.3%
Dash Punctuation
ValueCountFrequency (%)
- 2788468
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19709937
78.5%
Latin 5386275
 
21.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3389718
17.2%
1 3334070
16.9%
- 2788468
14.1%
5 2124096
10.8%
3 1947939
9.9%
8 1330668
 
6.8%
6 1300957
 
6.6%
2 1204905
 
6.1%
4 987456
 
5.0%
9 711251
 
3.6%
Latin
ValueCountFrequency (%)
d 1280385
23.8%
e 1206866
22.4%
b 883523
16.4%
c 856772
15.9%
a 656824
12.2%
f 501905
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25096212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3389718
13.5%
1 3334070
13.3%
- 2788468
11.1%
5 2124096
8.5%
3 1947939
 
7.8%
8 1330668
 
5.3%
6 1300957
 
5.2%
d 1280385
 
5.1%
e 1206866
 
4.8%
2 1204905
 
4.8%
Other values (7) 5188140
20.7%

Interactions

2023-06-13T12:55:59.782139image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Missing values

2023-06-13T12:56:03.291137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-13T12:56:06.904269image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-06-13T12:56:14.639704image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0GuidFullNameFirstNameSurnameIdNumberAllergyTypeDisabilityTypeHealthConditionsEmergencyContactNumberEmergencyContactFullNameEmergencyContactFirstNameEmergencyContactSurnameAlternativePickupFirstNameAlternativePickupSurnameAlternativePickupContactNumberBirthDateStartDateHasAllergyHasDisabilityCaregiverPopiaConsentCaregiverPhotographyAndFilmingConsentIsSouthAfricanCitizenHasIdNumberGenderEthnicGroupHomeLanguageGrantTypePlaygroupGroupInactiveReasonStatusFranchisee.GuidCaregiver.FullNameCaregiver.FirstNameCaregiver.SurnameCaregiver.IdNumberCaregiver.ContactNumberCaregiver.RelationshipTypeCaregiver.HighestEducationLevelCaregiver.LanguageCaregiver.Guid
000605e301-a345-ea11-833a-00155d326100Mxolisi komaniMxolisikomani0000000000012NaNNaNNaN0635118027Hans KoopmanNaNNaNNaNNaNNaN2017-02-16T22:00:00Z2020-01-17T00:00:00FalseNaNFalseFalseTrueFalseMaleAfricanisiXhosaChild GrantGroup AFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Shieda KomaniShiedaKomani95110702550850635118027MotherNo MatricisiXhosa3b69f593-3d43-ea11-8330-080027a7109a
115c1e6f1a-bc45-ea11-833a-00155d326100Thateho RamohlabiThatehoRamohlabi1807095666084NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2018-07-08T22:00:00Z2020-01-01T00:00:00FalseNaNFalseFalseFalseTrueMaleAfricanSetswanaChild GrantNoneFranchisee left the programmeActive68814817-0705-ea11-8329-0800274bb0e4Mpho RamohlabiMphoRamohlabiNaN0780401410MotherNo MatricNaN581e6f1a-bc45-ea11-833a-00155d326100
225637445f-eb45-ea11-833a-00155d326100Shenaaze van wykShenaazevan wyk0000000000012NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2016-04-10T22:00:00Z2020-01-17T00:00:00FalseNaNFalseFalseFalseFalseFemaleAfricanAfrikaansNaNNoneNaNActive1e84c406-3deb-e911-8325-0800274bb0e4Ragel De BruinRagelDe Bruin60043005740800651332965MotherNo MatricAfrikaansf32af543-b745-ea11-833a-00155d326100
334da208b6-fa45-ea11-833a-00155d326100Leatitia ZonaLeatitiaZona0000000000012NaNNaNNaN0714248050Valencia Van WykNaNNaNNaNNaNNaN2015-06-10T22:00:00Z2019-10-03T00:00:00FalseNaNFalseFalseTrueFalseMaleAfricanAfrikaansChild GrantGroup AFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Leatitia ZonaLeatitiaZona86122902320850719602840MotherNo MatricNaNe99b7a2c-f945-ea11-833a-00155d326100
44cdb4a38c-4f46-ea11-833a-00155d326100Avandro Pieter KlaasteAvandro PieterKlaaste1806226123086NaNNaNNaN0625698598Eugene LouwNaNNaNNaNNaNNaN2018-10-07T22:00:00Z2019-10-22T00:00:00FalseNaNTrueTrueFalseTrueMaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Esmeralda KlaasteEsmeraldaKlaaste98010201490860632598548MotherNo MatricNaN1254a16f-4f46-ea11-833a-00155d326100
552b427474-5046-ea11-833a-00155d326100Gillasha KoopmanGillashaKoopman1810270892081NaNNaNNaN0769598598Leandre KoopmanNaNNaNNaNNaNNaN2018-10-26T22:00:00Z2019-11-18T00:00:00FalseNaNTrueTrueTrueTrueFemaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive8c168cb8-65ee-e911-8325-0800274bb0e4Leandre KoopmanLeandreKoopman00102200380860725969896MotherNo MatricNaN52264458-5046-ea11-833a-00155d326100
6652abcbd9-5046-ea11-833a-00155d326100Mpaballeng Happiness MayaMpaballeng HappinessMaya1805060468080NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2018-05-05T22:00:00Z2020-01-06T00:00:00FalseNaNFalseFalseFalseTrueFemaleAfricanNaNChild GrantNoneNaNActived5a7ba31-64ee-e911-8325-0800274bb0e4Mapaseka MayaMapasekaMaya88030205240870735119766NaNNaNNaNf68abdbd-5046-ea11-833a-00155d326100
776e17db16-5146-ea11-833a-00155d326100Prihano DavidsPrihanoDavids0000000000012NaNNaNNaN0738862330Filicia DawidNaNNaNNaNNaNNaN2017-03-17T22:00:00Z2019-11-13T00:00:00FalseNaNTrueTrueFalseFalseMaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive8c168cb8-65ee-e911-8325-0800274bb0e4Filicia DavidFiliciaDavid83091501390840738862330MotherNo MatricNaNd408dcf1-5046-ea11-833a-00155d326100
886aab0708-5246-ea11-833a-00155d326100Kim-lee WolmaransKim-leeWolmarans1805280403085NaNNaNNaN07458996856DelixaNaNNaNNaNNaNNaN2018-05-27T22:00:00Z2019-11-19T00:00:00FalseNaNTrueTrueTrueTrueFemaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive8c168cb8-65ee-e911-8325-0800274bb0e4Delixa WolmaransDelixaWolmarans83082501620870712569698MotherNo MatricNaN92e45ce2-5146-ea11-833a-00155d326100
9963745080-5246-ea11-833a-00155d326100Leonardo JansenLeonardoJansen1712036113083NaNNaNNaN081255889JuanettaNaNNaNNaNNaNNaN2017-12-02T22:00:00Z2019-11-20T00:00:00FalseNaNTrueTrueTrueTrueMaleColouredAfrikaansChild GrantGroup AFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Juanetta JansenJuanettaJansen93050402470860725698856MotherNo MatricNaN89ccd869-5246-ea11-833a-00155d326100
Unnamed: 0GuidFullNameFirstNameSurnameIdNumberAllergyTypeDisabilityTypeHealthConditionsEmergencyContactNumberEmergencyContactFullNameEmergencyContactFirstNameEmergencyContactSurnameAlternativePickupFirstNameAlternativePickupSurnameAlternativePickupContactNumberBirthDateStartDateHasAllergyHasDisabilityCaregiverPopiaConsentCaregiverPhotographyAndFilmingConsentIsSouthAfricanCitizenHasIdNumberGenderEthnicGroupHomeLanguageGrantTypePlaygroupGroupInactiveReasonStatusFranchisee.GuidCaregiver.FullNameCaregiver.FirstNameCaregiver.SurnameCaregiver.IdNumberCaregiver.ContactNumberCaregiver.RelationshipTypeCaregiver.HighestEducationLevelCaregiver.LanguageCaregiver.Guid
69710718831da87f56b-45a5-ec11-8351-00155d326100Ntobiso AmahleNtobisoAmahle2012231507081NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2020-12-22T22:00:00Z2022-03-16T00:00:00FalseFalseFalseFalseFalseTrueFemaleNaNNaNChild GrantNoneNaNActive08e7b636-cd8b-e711-80e2-005056815442Lungisile DhlaminiLungisileDhlamini8504071169083NaNNaNNaNNaNd687f56b-45a5-ec11-8351-00155d326100
69710818832ce14e8db-47a5-ec11-8351-00155d326100Kwenziwe DlaminiKwenziweDlamini1903206744082NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2019-03-19T22:00:00Z2022-03-16T00:00:00FalseFalseFalseFalseFalseTrueMaleNaNNaNChild GrantNoneNaNActivee9861a57-7aef-e611-80d3-005056815442Nondumiso DlaminiNondumisoDlamini9311170411088NaNNaNNaNNaNca14e8db-47a5-ec11-8351-00155d326100
697109188333c70c606-55a5-ec11-8351-00155d326100Thandolwethu SekonyelaThandolwethuSekonyela1811146251086NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2018-11-13T22:00:00Z2022-03-16T00:00:00FalseFalseFalseFalseFalseTrueMaleNaNNaNChild GrantNoneNaNActive33c90931-6409-ec11-834c-00155d326100Nomasonto SekonyelaNomasontoSekonyela9009090371089NaNNaNNaNNaN3870c606-55a5-ec11-8351-00155d326100
69711018834cfe1de2d-57a5-ec11-8351-00155d326100ratile letholeratileletholeNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2022-03-16T00:00:00FalseFalseFalseFalseFalseTrueNaNNaNNaNChild GrantNoneNaNActiveed304ca3-eff1-e611-80d3-005056815442puleng evodia letholepuleng evodialethole8704100724086NaNNaNNaNNaNcbe1de2d-57a5-ec11-8351-00155d326100
69711118835d4ad822e-bfa5-ec11-8351-00155d326100Skylar HornSkylarHorn2003111425080NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2020-03-10T22:00:00Z2022-03-17T00:00:00FalseFalseFalseFalseFalseTrueFemaleNaNNaNChild GrantNoneNaNActived38e0456-2f42-e911-828d-0800274bb0e4Mary-Ann HornMary-AnnHorn92011800650830655907575MotherNaNNaNe6fcf2f9-5797-eb11-8346-00155d326100
69711218836e2d8ae07-c6a5-ec11-8351-00155d326100Lwandle Ntuli NtuliLwandle NtuliNtuli1711075092081NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2017-11-06T22:00:00Z2022-03-17T00:00:00FalseFalseFalseFalseFalseTrueMaleNaNNaNChild GrantNoneNaNActiveef759b16-c60a-ea11-8329-0800274bb0e4Thulisile NtuliThulisileNtuli9008060420084NaNNaNNaNNaNded8ae07-c6a5-ec11-8351-00155d326100
6971131883763d0fbf7-c6a5-ec11-8351-00155d326100Ntando Kearabetswe ThwalaNtando KearabetsweThwala1708200745081NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2017-08-19T22:00:00Z2022-03-17T00:00:00FalseFalseFalseFalseFalseTrueFemaleNaNNaNChild GrantNoneNaNActiveef759b16-c60a-ea11-8329-0800274bb0e4Lydia ThwalaLydiaThwala4503290442085NaNNaNNaNNaN5ad0fbf7-c6a5-ec11-8351-00155d326100
69711418838bc2c8931-c9a5-ec11-8351-00155d326100Nkanyezi Zimkhitha ZamisaNkanyezi ZimkhithaZamisa1805190953084NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2018-05-18T22:00:00Z2022-03-17T00:00:00FalseFalseFalseFalseFalseTrueFemaleNaNNaNChild GrantNoneNaNActived8995a5c-3b5c-e911-82e3-0800274bb0e4Thembekile ZamisaThembekileZamisa9503090119086NaNNaNNaNNaNb82c8931-c9a5-ec11-8351-00155d326100
6971151883971b1535a-caa5-ec11-8351-00155d326100Pelontle Felicia TumaeletsePelontle FeliciaTumaeletse1812210369085NaNNaNNaN0797291366Tshepang TumaeletseNaNNaNNaNNaNNaN2018-12-20T22:00:00Z2022-01-24T00:00:00FalseFalseFalseTrueTrueTrueFemaleAfricanSetswanaChild GrantGroup ANaNActive31aa94db-98fc-e911-8329-0800274bb0e4Tshepang  TumaeletseTshepangTumaeletse00040911910820762988267MotherNo MatricNaNa06bc816-caa5-ec11-8351-00155d326100
697116188409e83ea9d-cca5-ec11-8351-00155d326100Rachel MadondoRachelMadondo1908170000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2019-08-16T22:00:00Z2022-03-17T00:00:00FalseFalseFalseFalseFalseTrueFemaleNaNNaNChild GrantNoneNaNActived8995a5c-3b5c-e911-82e3-0800274bb0e4Kumbirai MuteroKumbiraiMutero8506200000002NaNNaNNaNNaN9a83ea9d-cca5-ec11-8351-00155d326100

Duplicate rows

Most frequently occurring

Unnamed: 0GuidFullNameFirstNameSurnameIdNumberAllergyTypeDisabilityTypeEmergencyContactNumberEmergencyContactFullNameAlternativePickupContactNumberBirthDateStartDateHasAllergyHasDisabilityCaregiverPopiaConsentCaregiverPhotographyAndFilmingConsentIsSouthAfricanCitizenHasIdNumberGenderEthnicGroupHomeLanguageGrantTypePlaygroupGroupInactiveReasonStatusFranchisee.GuidCaregiver.FullNameCaregiver.FirstNameCaregiver.SurnameCaregiver.IdNumberCaregiver.ContactNumberCaregiver.RelationshipTypeCaregiver.HighestEducationLevelCaregiver.LanguageCaregiver.Guid# duplicates
000605e301-a345-ea11-833a-00155d326100Mxolisi komaniMxolisikomani0000000000012NaNNaN0635118027Hans KoopmanNaN2017-02-16T22:00:00Z2020-01-17T00:00:00FalseNaNFalseFalseTrueFalseMaleAfricanisiXhosaChild GrantGroup AFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Shieda KomaniShiedaKomani95110702550850635118027MotherNo MatricisiXhosa3b69f593-3d43-ea11-8330-080027a7109a37
115c1e6f1a-bc45-ea11-833a-00155d326100Thateho RamohlabiThatehoRamohlabi1807095666084NaNNaNNaNNaNNaN2018-07-08T22:00:00Z2020-01-01T00:00:00FalseNaNFalseFalseFalseTrueMaleAfricanSetswanaChild GrantNoneFranchisee left the programmeActive68814817-0705-ea11-8329-0800274bb0e4Mpho RamohlabiMphoRamohlabiNaN0780401410MotherNo MatricNaN581e6f1a-bc45-ea11-833a-00155d32610037
225637445f-eb45-ea11-833a-00155d326100Shenaaze van wykShenaazevan wyk0000000000012NaNNaNNaNNaNNaN2016-04-10T22:00:00Z2020-01-17T00:00:00FalseNaNFalseFalseFalseFalseFemaleAfricanAfrikaansNaNNoneNaNActive1e84c406-3deb-e911-8325-0800274bb0e4Ragel De BruinRagelDe Bruin60043005740800651332965MotherNo MatricAfrikaansf32af543-b745-ea11-833a-00155d32610037
334da208b6-fa45-ea11-833a-00155d326100Leatitia ZonaLeatitiaZona0000000000012NaNNaN0714248050Valencia Van WykNaN2015-06-10T22:00:00Z2019-10-03T00:00:00FalseNaNFalseFalseTrueFalseMaleAfricanAfrikaansChild GrantGroup AFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Leatitia ZonaLeatitiaZona86122902320850719602840MotherNo MatricNaNe99b7a2c-f945-ea11-833a-00155d32610037
44cdb4a38c-4f46-ea11-833a-00155d326100Avandro Pieter KlaasteAvandro PieterKlaaste1806226123086NaNNaN0625698598Eugene LouwNaN2018-10-07T22:00:00Z2019-10-22T00:00:00FalseNaNTrueTrueFalseTrueMaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Esmeralda KlaasteEsmeraldaKlaaste98010201490860632598548MotherNo MatricNaN1254a16f-4f46-ea11-833a-00155d32610037
552b427474-5046-ea11-833a-00155d326100Gillasha KoopmanGillashaKoopman1810270892081NaNNaN0769598598Leandre KoopmanNaN2018-10-26T22:00:00Z2019-11-18T00:00:00FalseNaNTrueTrueTrueTrueFemaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive8c168cb8-65ee-e911-8325-0800274bb0e4Leandre KoopmanLeandreKoopman00102200380860725969896MotherNo MatricNaN52264458-5046-ea11-833a-00155d32610037
6652abcbd9-5046-ea11-833a-00155d326100Mpaballeng Happiness MayaMpaballeng HappinessMaya1805060468080NaNNaNNaNNaNNaN2018-05-05T22:00:00Z2020-01-06T00:00:00FalseNaNFalseFalseFalseTrueFemaleAfricanNaNChild GrantNoneNaNActived5a7ba31-64ee-e911-8325-0800274bb0e4Mapaseka MayaMapasekaMaya88030205240870735119766NaNNaNNaNf68abdbd-5046-ea11-833a-00155d32610037
776e17db16-5146-ea11-833a-00155d326100Prihano DavidsPrihanoDavids0000000000012NaNNaN0738862330Filicia DawidNaN2017-03-17T22:00:00Z2019-11-13T00:00:00FalseNaNTrueTrueFalseFalseMaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive8c168cb8-65ee-e911-8325-0800274bb0e4Filicia DavidFiliciaDavid83091501390840738862330MotherNo MatricNaNd408dcf1-5046-ea11-833a-00155d32610037
886aab0708-5246-ea11-833a-00155d326100Kim-lee WolmaransKim-leeWolmarans1805280403085NaNNaN07458996856DelixaNaN2018-05-27T22:00:00Z2019-11-19T00:00:00FalseNaNTrueTrueTrueTrueFemaleColouredAfrikaansChild GrantGroup BFranchisee left the programmeActive8c168cb8-65ee-e911-8325-0800274bb0e4Delixa WolmaransDelixaWolmarans83082501620870712569698MotherNo MatricNaN92e45ce2-5146-ea11-833a-00155d32610037
9963745080-5246-ea11-833a-00155d326100Leonardo JansenLeonardoJansen1712036113083NaNNaN081255889JuanettaNaN2017-12-02T22:00:00Z2019-11-20T00:00:00FalseNaNTrueTrueTrueTrueMaleColouredAfrikaansChild GrantGroup AFranchisee left the programmeActive1e84c406-3deb-e911-8325-0800274bb0e4Juanetta JansenJuanettaJansen93050402470860725698856MotherNo MatricNaN89ccd869-5246-ea11-833a-00155d32610037